On Thu, 4 Aug 2016 12:01:13 -0700 Aruna Ramakrishna <aruna.ramakris...@oracle.com> wrote:
> On large systems, when some slab caches grow to millions of objects (and > many gigabytes), running 'cat /proc/slabinfo' can take up to 1-2 seconds. > During this time, interrupts are disabled while walking the slab lists > (slabs_full, slabs_partial, and slabs_free) for each node, and this > sometimes causes timeouts in other drivers (for instance, Infiniband). > > This patch optimizes 'cat /proc/slabinfo' by maintaining a counter for > total number of allocated slabs per node, per cache. This counter is > updated when a slab is created or destroyed. This enables us to skip > traversing the slabs_full list while gathering slabinfo statistics, and > since slabs_full tends to be the biggest list when the cache is large, it > results in a dramatic performance improvement. Getting slabinfo statistics > now only requires walking the slabs_free and slabs_partial lists, and > those lists are usually much smaller than slabs_full. We tested this after > growing the dentry cache to 70GB, and the performance improved from 2s to > 5ms. I assume this is tested on both slab and slub? It isn't the smallest of patches but given the seriousness of the problem I think I'll tag it for -stable backporting.