On 31/01/2021 01:07, Andy Lutomirski wrote: > Adding Andrew Cooper, who has a distressingly extensive understanding > of the x86 PTE magic.
Pretty sure it is all learning things the hard way... > On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit <nadav.a...@gmail.com> wrote: >> diff --git a/mm/mprotect.c b/mm/mprotect.c >> index 632d5a677d3f..b7473d2c9a1f 100644 >> --- a/mm/mprotect.c >> +++ b/mm/mprotect.c >> @@ -139,7 +139,8 @@ static unsigned long change_pte_range(struct mmu_gather >> *tlb, >> ptent = pte_mkwrite(ptent); >> } >> ptep_modify_prot_commit(vma, addr, pte, oldpte, >> ptent); >> - tlb_flush_pte_range(tlb, addr, PAGE_SIZE); >> + if (pte_may_need_flush(oldpte, ptent)) >> + tlb_flush_pte_range(tlb, addr, PAGE_SIZE); You're choosing to avoid the flush, based on A/D bits read ahead of the actual modification of the PTE. In this example, another thread can write into the range (sets A and D), and get a suitable TLB entry which goes unflushed while the rest of the kernel thinks the memory is write-protected and clean. The only safe way to do this is to use XCHG/etc to modify the PTE, and base flush calculations on the results. Atomic operations are ordered with A/D updates from pagewalks on other CPUs, even on AMD where A updates are explicitly not ordered with regular memory reads, for performance reasons. ~Andrew