On 04/12/2017 03:33 PM, Vlastimil Babka wrote:
> On 03/02/2017 04:10 PM, Kirill A. Shutemov wrote:
>> In case prot_numa, we are under down_read(mmap_sem). It's critical
>> to not clear pmd intermittently to avoid race with MADV_DONTNEED
>> which is also under down_read(mmap_sem):
>>
>>      CPU0:                           CPU1:
>>                              change_huge_pmd(prot_numa=1)
>>                               pmdp_huge_get_and_clear_notify()
>> madvise_dontneed()
>>  zap_pmd_range()
>>   pmd_trans_huge(*pmd) == 0 (without ptl)
>>   // skip the pmd
>>                               set_pmd_at();
>>                               // pmd is re-established
>>
>> The race makes MADV_DONTNEED miss the huge pmd and don't clear it
>> which may break userspace.
>>
>> Found by code analysis, never saw triggered.
>>
>> Signed-off-by: Kirill A. Shutemov <[email protected]>
>> ---
>>  mm/huge_memory.c | 34 +++++++++++++++++++++++++++++++++-
>>  1 file changed, 33 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index e7ce73b2b208..bb2b3646bd78 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -1744,7 +1744,39 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t 
>> *pmd,
>>      if (prot_numa && pmd_protnone(*pmd))
>>              goto unlock;
>>  
>> -    entry = pmdp_huge_get_and_clear_notify(mm, addr, pmd);
>> +    /*
>> +     * In case prot_numa, we are under down_read(mmap_sem). It's critical
>> +     * to not clear pmd intermittently to avoid race with MADV_DONTNEED
>> +     * which is also under down_read(mmap_sem):
>> +     *
>> +     *      CPU0:                           CPU1:
>> +     *                              change_huge_pmd(prot_numa=1)
>> +     *                               pmdp_huge_get_and_clear_notify()
>> +     * madvise_dontneed()
>> +     *  zap_pmd_range()
>> +     *   pmd_trans_huge(*pmd) == 0 (without ptl)
>> +     *   // skip the pmd
>> +     *                               set_pmd_at();
>> +     *                               // pmd is re-established
>> +     *
>> +     * The race makes MADV_DONTNEED miss the huge pmd and don't clear it
>> +     * which may break userspace.
>> +     *
>> +     * pmdp_invalidate() is required to make sure we don't miss
>> +     * dirty/young flags set by hardware.
>> +     */
>> +    entry = *pmd;
>> +    pmdp_invalidate(vma, addr, pmd);
>> +
>> +    /*
>> +     * Recover dirty/young flags.  It relies on pmdp_invalidate to not
>> +     * corrupt them.
>> +     */
> 
> pmdp_invalidate() does:
> 
>         pmd_t entry = *pmdp;
>         set_pmd_at(vma->vm_mm, address, pmdp, pmd_mknotpresent(entry));
> 
> so it's not atomic and if CPU sets dirty or accessed in the middle of
> this, they will be lost?
> 
> But I don't see how the other invalidate caller
> __split_huge_pmd_locked() deals with this either. Andrea, any idea?

Looks like we didn't resolve this and meanwhile the patch is in mainline
as ced108037c2aa. CC Andy who deals with TLB a lot these days.

> Vlastimil
> 
>> +    if (pmd_dirty(*pmd))
>> +            entry = pmd_mkdirty(entry);
>> +    if (pmd_young(*pmd))
>> +            entry = pmd_mkyoung(entry);
>> +
>>      entry = pmd_modify(entry, newprot);
>>      if (preserve_write)
>>              entry = pmd_mk_savedwrite(entry);
>>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected].  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]";> [email protected] </a>
> 

Reply via email to