Overall on POWER8, this series increases vfork+exec+exit
microbenchmark rate by 15.6%, and mmap+munmap rate by 81%. Slice
code/data size is reduced by 1kB, and max stack overhead through
slice_get_unmapped_area call goes rom 992 to 448 bytes. The cost is
288 bytes added to the mm_context_t per mm for the slice masks on
Book3S.

Since v1:
- Fixed a couple of bugs and compile errors on 8xx.
- Accounted for all Christophe's review feedback hopefully.
- Got rid of unrelated "cleanup" hunks, and split one to its own patch.
- Dropped patch to dynamically limit bitmap operations. This may be
  revisited after Aneesh's 4TB patches.

Thanks,
Nick

Nicholas Piggin (10):
  powerpc/mm/slice: Simplify and optimise slice context initialisation
  powerpc/mm/slice: tidy lpsizes and hpsizes update loops
  powerpc/mm/slice: pass pointers to struct slice_mask where possible
  powerpc/mm/slice: implement a slice mask cache
  powerpc/mm/slice: implement slice_check_range_fits
  powerpc/mm/slice: Switch to 3-operand slice bitops helpers
  powerpc/mm/slice: remove dead code
  powerpc/mm/slice: Use const pointers to cached slice masks where
    possible
  powerpc/mm/slice: remove radix calls to the slice code
  powerpc/mm/slice: use the dynamic high slice size to limit bitmap
    operations

 arch/powerpc/include/asm/book3s/64/mmu.h |  18 ++
 arch/powerpc/include/asm/hugetlb.h       |   7 +-
 arch/powerpc/include/asm/mmu-8xx.h       |  10 +
 arch/powerpc/include/asm/slice.h         |   8 +-
 arch/powerpc/mm/hugetlbpage.c            |   6 +-
 arch/powerpc/mm/mmu_context_book3s64.c   |   9 +-
 arch/powerpc/mm/mmu_context_nohash.c     |   5 +-
 arch/powerpc/mm/slice.c                  | 461 ++++++++++++++++---------------
 8 files changed, 277 insertions(+), 247 deletions(-)

-- 
2.16.1

Reply via email to