On Tue, 2017-11-07 at 07:53:09 UTC, Nicholas Piggin wrote: > Unmaps that free page tables always flush the entire PID, which is > sub-optimal. Provide TLB range flushing with an additional PWC flush > that can be use for va range invalidations with PWC flush. > > Time to munmap N pages of memory including last level page table > teardown (after mmap, touch), local invalidate: > N 1 2 4 8 16 32 64 > vanilla 3.2us 3.3us 3.4us 3.6us 4.1us 5.2us 7.2us > patched 1.4us 1.5us 1.7us 1.9us 2.6us 3.7us 6.2us > > Global invalidate: > N 1 2 4 8 16 32 64 > vanilla 2.2us 2.3us 2.4us 2.6us 3.2us 4.1us 6.2us > patched 2.1us 2.5us 3.4us 5.2us 8.7us 15.7us 6.2us > > Local invalidates get much better across the board. Global ones have > the same issue where multiple tlbies for va flush do get slower than > the single tlbie to invalidate the PID. None of this test captures > the TLB benefits of avoiding killing everything. > > Global gets worse, but it is brought in to line with global invalidate > for munmap()s that do not free page tables. > > Signed-off-by: Nicholas Piggin <npig...@gmail.com>
Applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/0b2f5a8a792755c88bd786f89712a9 cheers