Enabling CONFIG_LOCKDEP and other related debug options will greatly reduce system performance. This patchset aims to reduce the performance slowdown caused by the lockdep code.
Patch 1 just removes an inline function that wasn't used. Patches 2 and 3 are minor twists to optimize the code. Patch 4 makes class->ops a per-cpu counter. Patch 5 moves the lock_release() call outside of a lock critical section. Parallel kernel compilation tests (make -j <#cpu>) were performed on 2 different systems: 1) an 1-socket 22-core 44-thread Skylake system 2) a 4-socket 72-core 144-thread Broadwell system The build times with pre-patch and post-patch debug kernels were: System Pre-patch Post-patch %Change ------ --------- ---------- ------- 1-socket 8m53.9s 8m41.2s -2.4% 4-socket 7m27.0s 5m31.0s -26% I think it is the last 2 patches that yield most of the performance improvement. Waiman Long (5): locking/lockdep: Remove add_chain_cache_classes() locking/lockdep: Eliminate redundant irqs check in __lock_acquire() locking/lockdep: Add a faster path in __lock_release() locking/lockdep: Make class->ops a percpu counter locking/lockdep: Call lock_release after releasing the lock include/linux/lockdep.h | 2 +- include/linux/rwlock_api_smp.h | 16 +++--- include/linux/spinlock_api_smp.h | 8 +-- kernel/locking/lockdep.c | 120 ++++++++++++--------------------------- 4 files changed, 48 insertions(+), 98 deletions(-) -- 1.8.3.1