On 11/12/13, 4:08 PM, Alexander Motin wrote: > On 12.11.2013 17:51, Saso Kiselkov wrote: >> On 11/12/13, 3:38 PM, Alexander Motin wrote: >>> Hi. >>> >>> While doing some performance tests I've found that LZ4 compression in >>> ZFS on FreeBSD each time allocates hash memory directly from VM, that on >>> multi-core system under significant load may consume more CPU time then >>> the compression itself. On 64-bit illumos that memory is allocated on >>> stack, but FreeBSD's kernel stack is smaller and has no sufficient space >>> (16K). I've made quite simple patch to reduce the allocation overhead by >>> creating allocation cache, same as it is done for ZIO. While for 64bit >>> illumos this patch is a nop, smaller architectures may still benefit >>> from it, same as FreeBSD does. >>> >>> Any comments about it: http://people.freebsd.org/~mav/lz4_alloc.patch ? >> >> After a bit of benchmarking Illumos switched to using kmem_alloc for LZ4 >> compression as well (discarding the stack allocations, because they were >> fragile and didn't do much for performance). It'd be interesting to see >> why kmem operations on FreeBSD are so inefficient under load - perhaps >> some worthwhile refactoring work there? > > Because allocations above page size (16K > 4K) are not cached by > allocator. Probably it could be improved and some work is going on > there, but as I can see illumos in case of ZIO in ZFS also explicitly > uses kmem_cache_create() to handle probably alike issues. > >> Or can you please post more details of your testing setup? > > That was SPEC 2008 NFS benchmark on 2x6x2-core Xeon system, quickly > creating huge amount of files sized from 1K to several megabytes on FS > with LZ4 compression enabled. Without this patch profiler shown me about > 20% of adaptive lock spinning around free call, doing also TLB > invalidation on all CPU cores. With this patch I see no any issues from > allocation at all. >
Interesting. Could you try switching to using an explicit kmem cache? I considered doing this when changing the implementation in Illumos, but I saw no performance benefits. If they are there when the system is under memory pressure, then it's certainly something we'd like to fix on all platforms. -- Saso _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
