On Thu, 5 May 2016 20:42:56 +0800 Zhou Chengming <[email protected]> 
wrote:

> A concurrency issue about KSM in the function scan_get_next_rmap_item.
> 
> task A (ksmd):                                |task B (the mm's task):
>                                       |
> mm = slot->mm;                                |
> down_read(&mm->mmap_sem);             |
>                                       |
> ...                                   |
>                                       |
> spin_lock(&ksm_mmlist_lock);          |
>                                       |
> ksm_scan.mm_slot go to the next slot; |
>                                       |
> spin_unlock(&ksm_mmlist_lock);                |
>                                       |mmput() ->
>                                       |       ksm_exit():
>                                       |
>                                       |spin_lock(&ksm_mmlist_lock);
>                                       |if (mm_slot && ksm_scan.mm_slot != 
> mm_slot) {
>                                       |       if (!mm_slot->rmap_list) {
>                                       |               easy_to_free = 1;
>                                       |               ...
>                                       |
>                                       |if (easy_to_free) {
>                                       |       mmdrop(mm);
>                                       |       ...
>                                       |
>                                       |So this mm_struct will be freed 
> successfully.
>                                       |
> up_read(&mm->mmap_sem);                       |
> 
> As we can see above, the ksmd thread may access a mm_struct that already
> been freed to the kmem_cache.
> Suppose a fork will get this mm_struct from the kmem_cache, the ksmd thread
> then call up_read(&mm->mmap_sem), will cause mmap_sem.count to become -1.
> I changed the scan_get_next_rmap_item function refered to the khugepaged
> scan function.

Thanks.

We need to decide whether this fix should be backported into earlier
(-stable) kernels.  Can you tell us how easily this is triggered and
share your thoughts on this?

Reply via email to