Hi, The 1st patch removes memory footprint of percpu_ref in fast path from 7 words to 2 words, since it is often used in fast path and embedded in user struct.
The 2nd patch moves .q_usage_counter to 1st cacheline of 'request_queue'. Simple test on null_blk shows ~2% IOPS boost on one 16cores(two threads per core) machine, dual socket/numa. V6: - drop the 1st patch which adds percpu_ref_is_initialized() for MD only, since Christoph doesn't like it V5: - fix memory leak on ref->data, only percpu_ref_exit() of patch 2 is modified. V4: - rename percpu_ref_inited as percpu_ref_is_initialized V3: - fix kernel oops on MD - add patch for avoiding to use percpu-refcount internal from md code - pass Red Hat CKI test which is done by Veronika Kabatova V2: - pass 'gfp' to kzalloc() for fixing block/027 failure reported by kernel test robot - protect percpu_ref_is_zero() with destroying percpu-refcount by spin lock Ming Lei (2): percpu_ref: reduce memory footprint of percpu_ref in fast path block: move 'q_usage_counter' into front of 'request_queue' drivers/infiniband/sw/rdmavt/mr.c | 2 +- include/linux/blkdev.h | 3 +- include/linux/percpu-refcount.h | 45 ++++------ lib/percpu-refcount.c | 131 ++++++++++++++++++++++-------- 4 files changed, 118 insertions(+), 63 deletions(-) Cc: Veronika Kabatova <vkaba...@redhat.com> Cc: Sagi Grimberg <s...@grimberg.me> Cc: Tejun Heo <t...@kernel.org> Cc: Christoph Hellwig <h...@lst.de> Cc: Jens Axboe <ax...@kernel.dk> Cc: Bart Van Assche <bvanass...@acm.org> -- 2.25.2