On Tue, Dec 19, 2017 at 05:53:36PM -0800, Matthew Wilcox wrote: > On Tue, Dec 19, 2017 at 04:20:51PM -0800, Paul E. McKenney wrote: > > If we are going to make this sort of change, we should do so in a way > > that allows the slab code to actually do the optimizations that might > > make this sort of thing worthwhile. After all, if the main goal was small > > code size, the best approach is to drop kfree_bulk() and get on with life > > in the usual fashion. > > > > I would prefer to believe that something like kfree_bulk() can help, > > and if that is the case, we should give it a chance to do things like > > group kfree_rcu() requests by destination slab and soforth, allowing > > batching optimizations that might provide more significant increases > > in performance. Furthermore, having this in slab opens the door to > > slab taking emergency action when memory is low. > > kfree_bulk does sort by destination slab; look at build_detached_freelist.
Understood, but beside the point. I suspect that giving it larger scope makes it more efficient, similar to disk drives in the old days. Grouping on the stack when processing RCU callbacks limits what can reasonably be done. Furthermore, using the vector approach going into the grace period is much more cache-efficient than the linked-list approach, given that the blocks have a reasonable chance of going cache-cold during the grace period. And the slab-related operations should really be in the slab code in any case rather than within RCU. Thanx, Paul