On 12.11.2013 17:51, Saso Kiselkov wrote:
On 11/12/13, 3:38 PM, Alexander Motin wrote:
Hi.
While doing some performance tests I've found that LZ4 compression in
ZFS on FreeBSD each time allocates hash memory directly from VM, that on
multi-core system under significant load may consume more CPU time then
the compression itself. On 64-bit illumos that memory is allocated on
stack, but FreeBSD's kernel stack is smaller and has no sufficient space
(16K). I've made quite simple patch to reduce the allocation overhead by
creating allocation cache, same as it is done for ZIO. While for 64bit
illumos this patch is a nop, smaller architectures may still benefit
from it, same as FreeBSD does.
Any comments about it: http://people.freebsd.org/~mav/lz4_alloc.patch ?
After a bit of benchmarking Illumos switched to using kmem_alloc for LZ4
compression as well (discarding the stack allocations, because they were
fragile and didn't do much for performance). It'd be interesting to see
why kmem operations on FreeBSD are so inefficient under load - perhaps
some worthwhile refactoring work there?
Because allocations above page size (16K > 4K) are not cached by
allocator. Probably it could be improved and some work is going on
there, but as I can see illumos in case of ZIO in ZFS also explicitly
uses kmem_cache_create() to handle probably alike issues.
Or can you please post more details of your testing setup?
That was SPEC 2008 NFS benchmark on 2x6x2-core Xeon system, quickly
creating huge amount of files sized from 1K to several megabytes on FS
with LZ4 compression enabled. Without this patch profiler shown me about
20% of adaptive lock spinning around free call, doing also TLB
invalidation on all CPU cores. With this patch I see no any issues from
allocation at all.
--
Alexander Motin
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer