On 11/12/13, 4:16 PM, Alexander Motin wrote: > On 12.11.2013 18:12, Saso Kiselkov wrote: >> On 11/12/13, 4:08 PM, Alexander Motin wrote: >>> On 12.11.2013 17:51, Saso Kiselkov wrote: >>>> On 11/12/13, 3:38 PM, Alexander Motin wrote: >>>>> Hi. >>>>> >>>>> While doing some performance tests I've found that LZ4 compression in >>>>> ZFS on FreeBSD each time allocates hash memory directly from VM, >>>>> that on >>>>> multi-core system under significant load may consume more CPU time >>>>> then >>>>> the compression itself. On 64-bit illumos that memory is allocated on >>>>> stack, but FreeBSD's kernel stack is smaller and has no sufficient >>>>> space >>>>> (16K). I've made quite simple patch to reduce the allocation >>>>> overhead by >>>>> creating allocation cache, same as it is done for ZIO. While for 64bit >>>>> illumos this patch is a nop, smaller architectures may still benefit >>>>> from it, same as FreeBSD does. >>>>> >>>>> Any comments about it: >>>>> http://people.freebsd.org/~mav/lz4_alloc.patch ? >>>> >>>> After a bit of benchmarking Illumos switched to using kmem_alloc for >>>> LZ4 >>>> compression as well (discarding the stack allocations, because they >>>> were >>>> fragile and didn't do much for performance). It'd be interesting to see >>>> why kmem operations on FreeBSD are so inefficient under load - perhaps >>>> some worthwhile refactoring work there? >>> >>> Because allocations above page size (16K > 4K) are not cached by >>> allocator. Probably it could be improved and some work is going on >>> there, but as I can see illumos in case of ZIO in ZFS also explicitly >>> uses kmem_cache_create() to handle probably alike issues. >>> >>>> Or can you please post more details of your testing setup? >>> >>> That was SPEC 2008 NFS benchmark on 2x6x2-core Xeon system, quickly >>> creating huge amount of files sized from 1K to several megabytes on FS >>> with LZ4 compression enabled. Without this patch profiler shown me about >>> 20% of adaptive lock spinning around free call, doing also TLB >>> invalidation on all CPU cores. With this patch I see no any issues from >>> allocation at all. >>> >> >> Interesting. Could you try switching to using an explicit kmem cache? > > That is what I did in my patch. Or do you mean something else?
Sorry, got your change confused with what Pawel was suggesting (using an enlarged stack). Looks good - you can even get rid of the HEAPMODE conditionals there. We should always use heap/cache allocation and never the unreliable stack stuff. Cheers, -- Saso _______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
