mprotect: do not flush on permission promotion

Nadav Amit Sun, 31 Jan 2021 22:38:35 -0800

> On Jan 31, 2021, at 4:10 AM, Andrew Cooper <andrew.coop...@citrix.com> wrote:
> 
> On 31/01/2021 01:07, Andy Lutomirski wrote:
>> Adding Andrew Cooper, who has a distressingly extensive understanding
>> of the x86 PTE magic.
> 
> Pretty sure it is all learning things the hard way...
> 
>> On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.a...@gmail.com> wrote:
>>> diff --git a/mm/mprotect.c b/mm/mprotect.c
>>> index 632d5a677d3f..b7473d2c9a1f 100644
>>> --- a/mm/mprotect.c
>>> +++ b/mm/mprotect.c
>>> @@ -139,7 +139,8 @@ static unsigned long change_pte_range(struct mmu_gather 
>>> *tlb,
>>>                                ptent = pte_mkwrite(ptent);
>>>                        }
>>>                        ptep_modify_prot_commit(vma, addr, pte, oldpte, 
>>> ptent);
>>> -                       tlb_flush_pte_range(tlb, addr, PAGE_SIZE);
>>> +                       if (pte_may_need_flush(oldpte, ptent))
>>> +                               tlb_flush_pte_range(tlb, addr, PAGE_SIZE);
> 
> You're choosing to avoid the flush, based on A/D bits read ahead of the
> actual modification of the PTE.
> 
> In this example, another thread can write into the range (sets A and D),
> and get a suitable TLB entry which goes unflushed while the rest of the
> kernel thinks the memory is write-protected and clean.
> 
> The only safe way to do this is to use XCHG/etc to modify the PTE, and
> base flush calculations on the results.  Atomic operations are ordered
> with A/D updates from pagewalks on other CPUs, even on AMD where A
> updates are explicitly not ordered with regular memory reads, for
> performance reasons.


Thanks Andrew for the feedback, but I think the patch does it exactly in
this safe manner that you describe (at least on native x86, but I see a
similar path elsewhere as well):

oldpte = ptep_modify_prot_start()
-> __ptep_modify_prot_start()
-> ptep_get_and_clear
-> native_ptep_get_and_clear()
-> xchg()

Note that the xchg() will clear the PTE (i.e., making it non-present), and
no further updates of A/D are possible until ptep_modify_prot_commit() is
called.

On non-SMP setups this is not atomic (no xchg), but since we hold the lock,
we should be safe.

I guess you are right and a pte_may_need_flush() deserves a comment to
clarify that oldpte must be obtained by an atomic operation to ensure no A/D
bits are lost (as you say).

Yet, I do not see a correctness problem. Am I missing something?

Re: [RFC 03/20] mm/mprotect: do not flush on permission promotion

Reply via email to