On Tue, Nov 12, 2013 at 8:28 AM, Alexander Motin <[email protected]> wrote:
> On 12.11.2013 18:22, Matthew Ahrens wrote: > >> On Tue, Nov 12, 2013 at 7:51 AM, Saso Kiselkov <[email protected] >> <mailto:[email protected]>> wrote: >> >> On 11/12/13, 3:38 PM, Alexander Motin wrote: >> > Hi. >> > >> > While doing some performance tests I've found that LZ4 compression >> in >> > ZFS on FreeBSD each time allocates hash memory directly from VM, >> that on >> > multi-core system under significant load may consume more CPU >> time then >> > the compression itself. On 64-bit illumos that memory is allocated >> on >> > stack, but FreeBSD's kernel stack is smaller and has no >> sufficient space >> > (16K). I've made quite simple patch to reduce the allocation >> overhead by >> > creating allocation cache, same as it is done for ZIO. While for >> 64bit >> > illumos this patch is a nop, smaller architectures may still >> benefit >> > from it, same as FreeBSD does. >> > >> > Any comments about it: >> http://people.freebsd.org/~mav/lz4_alloc.patch ? >> > >> >> After a bit of benchmarking Illumos switched to using kmem_alloc for >> LZ4 >> compression as well (discarding the stack allocations, because they >> were >> fragile and didn't do much for performance). It'd be interesting to >> see >> why kmem operations on FreeBSD are so inefficient under load - perhaps >> some worthwhile refactoring work there? Or can you please post more >> details of your testing setup? >> >> >> My understanding is that on FreeBSD, kmem_cache_alloc() uses >> uma_zalloc_arg(), which has fast, per-CPU caches of free buffers (like >> illumos). But on FreeBSD, kmem_alloc() uses malloc(), which is slower >> (whereas on illumos, kmem_alloc() just calls kmem_cache_alloc() from an >> appropriately-sized cache). See >> sys/cddl/compat/opensolaris/kern/opensolaris_kmem.c for details. >> >> Does anyone know the reasoning behind this? I.e. why kmem_alloc() does >> not have similar performance characteristics on FreeBSD as on illumos? >> > > FreeBSD malloc() does use uma_zalloc_arg() caches for small allocations. > For big it is less usable because large per-CPU caches tend to eat too much > extra memory and it is quite hard to purge those per-CPU caches in > low-memory condition. But considering that illumos at all has > kmem_cache_alloc() KPI there is also probably should be some difference > from plain kmem_alloc(). > Yes, primarily the ability to use constructors/destructors to save time when allocating. But it's true that illumos kmem_alloc() also falls back on a slow path (vmem_alloc()) for large allocations -- those above 128KB. --matt
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
