Hello, Linus.

A lot of changes for percpu this time around.  percpu inherited the
same area allocator from the original pre-virtual-address-mapped
implementation.  This was from the time when percpu allocator wasn't
used all that much and the implementation was focused on simplicity,
with the unfortunate computational complexity of O(number of areas
allocated from the chunk) per alloc / free.

With the increase in percpu usage, we're hitting cases where the lack
of scalability is hurting.  The most prominent one right now is bpf
perpcu map creation / destruction which may allocate and free a lot of
entries consecutively and it's likely that the problem will become
more prominent in the future.

To address the issue, Dennis replaced the area allocator with hinted
bitmap allocator which is more consistent.  While the new allocator
does perform a bit worse in some cases, it outperforms the old
allocator way more than an order of magnitude in other more common
scenarios while staying mostly flat in CPU overhead and completely
flat in memory consumption.

Thanks.

The following changes since commit 5771a8c08880cdca3bfb4a3fc6d309d6bba20877:

  Linux v4.13-rc1 (2017-07-15 15:22:10 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-4.14

for you to fetch changes up to 5e81ee3e6a79cc9fa85af5c3db0f1f269709bbf1:

  percpu: update header to contain bitmap allocator explanation. (2017-07-26 
17:41:06 -0400)

----------------------------------------------------------------
Dennis Zhou (Facebook) (27):
      percpu: pcpu-stats change void buffer to int buffer
      percpu: change the format for percpu_stats output
      percpu: expose pcpu_nr_empty_pop_pages in pcpu_stats
      percpu: update the header comment and pcpu_build_alloc_info comments
      percpu: setup_first_chunk enforce dynamic region must exist
      percpu: introduce start_offset to pcpu_chunk
      percpu: remove has_reserved from pcpu_chunk
      percpu: setup_first_chunk remove dyn_size and consolidate logic
      percpu: unify allocation of schunk and dchunk
      percpu: end chunk area maps page aligned for the populated bitmap
      percpu: setup_first_chunk rename schunk/dchunk to chunk
      percpu: modify base_addr to be region specific
      percpu: combine percpu address checks
      percpu: change the number of pages marked in the first_chunk pop bitmap
      percpu: introduce nr_empty_pop_pages to help empty page accounting
      percpu: increase minimum percpu allocation size and align first regions
      percpu: generalize bitmap (un)populated iterators
      percpu: replace area map allocator with bitmap
      percpu: introduce bitmap metadata blocks
      percpu: add first_bit to keep track of the first free in the bitmap
      percpu: skip chunks if the alloc does not fit in the contig hint
      percpu: keep track of the best offset for contig hints
      percpu: update alloc path to only scan if contig hints are broken
      percpu: update free path to take advantage of contig hints
      percpu: use metadata blocks to update the chunk contig hint
      percpu: update pcpu_find_block_fit to use an iterator
      percpu: update header to contain bitmap allocator explanation.

 include/linux/percpu.h |   20 +-
 init/main.c            |    1 -
 mm/percpu-internal.h   |   82 ++-
 mm/percpu-km.c         |    2 +-
 mm/percpu-stats.c      |  111 ++--
 mm/percpu.c            | 1522 ++++++++++++++++++++++++++++++------------------
 6 files changed, 1112 insertions(+), 626 deletions(-)

-- 
tejun

Reply via email to