Linus Torvalds <torva...@linux-foundation.org> writes:

> On Mon, May 24, 2021 at 3:38 AM Aneesh Kumar K.V
> <aneesh.ku...@linux.ibm.com> wrote:
>>
>> Avoid the above race with MOVE_PMD by holding pte ptl in mremap and waiting 
>> for
>> parallel pagetable walk to finish operating on pte before updating new_pmd
>
> Ack on the concept.


Should we worry about the below race. The window would be small

CPU 1                           CPU 2                                   CPU 3

mremap(old_addr, new_addr)      page_shrinker/try_to_unmap_one

mmap_write_lock_killable()

                                addr = old_addr

lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)

lock(pte_ptl)
*new_pmd = pmd
unlock(pte_ptl)

unlock(pmd_ptl)
                                lock(pte_ptl)
                                                                        
*new_addr = 10; and fills
                                                                        TLB 
with new addr
                                                                        and old 
pfn

                                ptep_clear_flush(old_addr)
                                old pfn is free.
                                                                        Stale 
TLB entry



>
> However, not so much on the patch.
>
> Odd whitespace change:
>
>> @@ -254,6 +254,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
>> unsigned long old_addr,
>>         if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
>>                 return false;
>>
>> +
>>         /*
>>          * We don't have to worry about the ordering of src and dst
>>          * ptlocks because exclusive mmap_lock prevents deadlock.
>
> And new optimization for empty pmd, which seems unrelated to the
> change and should presumably be separate:

That was added that we can safely do pte_lockptr() below

>
>> @@ -263,6 +264,10 @@ static bool move_normal_pmd(struct vm_area_struct *vma, 
>> unsigned long old_addr,
>>         if (new_ptl != old_ptl)
>>                 spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
>>
>> +       if (pmd_none(*old_pmd))
>> +               goto unlock_out;
>> +
>> +       pte_ptl = pte_lockptr(mm, old_pmd);
>>         /* Clear the pmd */
>>         pmd = *old_pmd;
>>         pmd_clear(old_pmd);
>
> And also, why does the above assign 'pte_ptl' without using it, when
> the actual use is ten lines further down?

So that we fetch the pte_ptl before we clear old_pmd. 


-aneesh

Reply via email to