Overall on POWER8, this series increases vfork+exec+exit microbenchmark rate by 15.6%, and mmap+munmap rate by 81%. Slice code/data size is reduced by 1kB, and max stack overhead through slice_get_unmapped_area call goes rom 992 to 448 bytes. The cost is 288 bytes added to the mm_context_t per mm for the slice masks on Book3S.
Since v1: - Fixed a couple of bugs and compile errors on 8xx. - Accounted for all Christophe's review feedback hopefully. - Got rid of unrelated "cleanup" hunks, and split one to its own patch. - Dropped patch to dynamically limit bitmap operations. This may be revisited after Aneesh's 4TB patches. Thanks, Nick Nicholas Piggin (10): powerpc/mm/slice: Simplify and optimise slice context initialisation powerpc/mm/slice: tidy lpsizes and hpsizes update loops powerpc/mm/slice: pass pointers to struct slice_mask where possible powerpc/mm/slice: implement a slice mask cache powerpc/mm/slice: implement slice_check_range_fits powerpc/mm/slice: Switch to 3-operand slice bitops helpers powerpc/mm/slice: remove dead code powerpc/mm/slice: Use const pointers to cached slice masks where possible powerpc/mm/slice: remove radix calls to the slice code powerpc/mm/slice: use the dynamic high slice size to limit bitmap operations arch/powerpc/include/asm/book3s/64/mmu.h | 18 ++ arch/powerpc/include/asm/hugetlb.h | 7 +- arch/powerpc/include/asm/mmu-8xx.h | 10 + arch/powerpc/include/asm/slice.h | 8 +- arch/powerpc/mm/hugetlbpage.c | 6 +- arch/powerpc/mm/mmu_context_book3s64.c | 9 +- arch/powerpc/mm/mmu_context_nohash.c | 5 +- arch/powerpc/mm/slice.c | 461 ++++++++++++++++--------------- 8 files changed, 277 insertions(+), 247 deletions(-) -- 2.16.1