On 2026/2/2 21:42, Peter Zijlstra wrote:
On Mon, Feb 02, 2026 at 09:23:07PM +0800, Lance Yang wrote:
Hmm... we need MB rather than RMB on the sync side. Is that correct?
Walker:
[W]active_lockless_pt_walk_mm = mm -> MB -> [L]page-tables
Sync:
[W]page-tables -> MB -> [L]active_lockless_pt_walk_mm
This can work -- but only if the walker and sync touch the same
page-table address.
Now, typically I would imagine they both share the p4d/pud address at
the very least, right?
Thanks. I think I see the confusion ...
To be clear, the goal is not to make the walker see page-table writes
through the
MB pairing, but to wait for any concurrent lockless page table walkers
to finish.
The flow is:
1) Page tables are modified
2) TLB flush is done
3) Read active_lockless_pt_walk_mm (with MB to order page-table writes
before
this read) to find which CPUs are locklessly walking this mm
4) IPI those CPUs
5) The IPI forces them to sync, so after the IPI returns, any in-flight
lockless
page table walk has finished (or will restart and see the new page
tables)
The synchronization relies on the IPI to ensure walkers stop before
continuing.
I would assume the TLB flush (step 2) should imply some barrier.
Does that clarify?