On 12.11.2013 18:22, Matthew Ahrens wrote:
On Tue, Nov 12, 2013 at 7:51 AM, Saso Kiselkov <[email protected]
<mailto:[email protected]>> wrote:

    On 11/12/13, 3:38 PM, Alexander Motin wrote:
     > Hi.
     >
     > While doing some performance tests I've found that LZ4 compression in
     > ZFS on FreeBSD each time allocates hash memory directly from VM,
    that on
     > multi-core system under significant load may consume more CPU
    time then
     > the compression itself. On 64-bit illumos that memory is allocated on
     > stack, but FreeBSD's kernel stack is smaller and has no
    sufficient space
     > (16K). I've made quite simple patch to reduce the allocation
    overhead by
     > creating allocation cache, same as it is done for ZIO. While for
    64bit
     > illumos this patch is a nop, smaller architectures may still benefit
     > from it, same as FreeBSD does.
     >
     > Any comments about it:
    http://people.freebsd.org/~mav/lz4_alloc.patch ?
     >

    After a bit of benchmarking Illumos switched to using kmem_alloc for LZ4
    compression as well (discarding the stack allocations, because they were
    fragile and didn't do much for performance). It'd be interesting to see
    why kmem operations on FreeBSD are so inefficient under load - perhaps
    some worthwhile refactoring work there? Or can you please post more
    details of your testing setup?


My understanding is that on FreeBSD, kmem_cache_alloc() uses
uma_zalloc_arg(), which has fast, per-CPU caches of free buffers (like
illumos).  But on FreeBSD, kmem_alloc() uses malloc(), which is slower
(whereas on illumos, kmem_alloc() just calls kmem_cache_alloc() from an
appropriately-sized cache).  See
sys/cddl/compat/opensolaris/kern/opensolaris_kmem.c for details.

Does anyone know the reasoning behind this?  I.e. why kmem_alloc() does
not have similar performance characteristics on FreeBSD as on illumos?

FreeBSD malloc() does use uma_zalloc_arg() caches for small allocations. For big it is less usable because large per-CPU caches tend to eat too much extra memory and it is quite hard to purge those per-CPU caches in low-memory condition. But considering that illumos at all has kmem_cache_alloc() KPI there is also probably should be some difference from plain kmem_alloc().

--
Alexander Motin
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to