On Sat, 16 Mar 2019 at 12:09, Bob Friesenhahn
<[email protected]> wrote:
> Using the default allocator:
> % gm benchmark -duration 10 convert -size 4000x3000 tile:model.pnm -wave 
> 25x150 null:
> Results: 40 threads 13 iter 74.63s user 10.122100s total 1.284 iter/s 0.174 
> iter/cpu
>
> Using libumem:
> % LD_PRELOAD_64=libumem.so.1 gm benchmark -duration 10 convert -size 
> 4000x3000 tile:model.pnm -wave 25x150 null:
> Results: 40 threads 13 iter 77.28s user 10.226807s total 1.271 iter/s 0.168 
> iter/cpu
>
> Using mtmalloc:
> % LD_PRELOAD_64=libmtmalloc.so.1 gm benchmark -duration 10 convert -size 
> 4000x3000 tile:model.pnm -wave 25x150 null:
> Results: 40 threads 64 iter 246.82s user 10.148286s total 6.306 iter/s 0.259 
> iter/cpu

Why was the last test 64 iterations instead of 13 like the others?

> Is this huge difference in performance due to mtmalloc expected?  I
> thought that modern libumem was supposed to make up most of the
> difference.

Do you know if the umem per-thread caching stuff is working here?  It
was originally added in:

    https://www.illumos.org/issues/4489

According to umem_alloc(3MALLOC) you can tune the per-thread cache
size with the UMEM_OPTIONS environment variable, and you can measure
various statistics by taking a core at an appropriate moment and using
"::umastat" from mdb.


Cheers.

-- 
Joshua M. Clulow
Engineer @ Joyent
http://blog.sysmgr.org

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T30dd2eceb8a069b3-M38f054395c400920aae0eb87
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to