On Mon, 16 Apr 2018, Vlastimil Babka wrote:
> On 04/16/2018 09:36 PM, Mikulas Patocka wrote:
> >>> I need to increase it just for dm-bufio slabs.
> >> If you do this then others will want the same...
> > If others need it, they can turn on the flag SLAB_MINIMIZE_WASTE too.
> I think it should be possible without a new flag. The slub allocator
> could just balance priorities (performance vs memory efficiency) better.
> Currently I get the impression that "slub_max_order" is a performance
> tunable. Let's add another criteria for selecting an order, that would
> try to pick an order to minimize wasted space below e.g. 10% with some
> different kind of max order. Pick good defaults, add tunables if you must.
> I mean, anyone who's creating a cache for 640KB objects most likely
> doesn't want to waste another 384KB by each such object. They shouldn't
> have to add a flag to let the slub allocator figure out that using 2MB
> pages is the right thing to do here.
The problem is that higher-order allocations (larger than 32K) are
unreliable. So, if you increase page order beyond that, the allocation may
dm-bufio deals gracefully with allocation failure, because it preallocates
some buffers with vmalloc, but other subsystems may not deal with it and
they cound return ENOMEM randomly or misbehave in other ways. So, the
"SLAB_MINIMIZE_WASTE" flag is also saying that the allocation may fail and
the caller is prepared to deal with it.
The slub subsystem does actual fallback to low-order when the allocation
fails (it allows different order for each slab in the same cache), but
slab doesn't fallback and you get NULL if higher-order allocation fails.
So, SLAB_MINIMIZE_WASTE is needed for slab because it will just randomly
fail with higher order.