Re: [discuss] libc/libumem lack of concurrency

Bob Friesenhahn Mon, 18 Mar 2019 06:56:05 -0700

Yesterday I did a full GraphicsMagick benchmark run using buildslinked with libumem and libmtmalloc.

Yesterday I was focusing on just one problematic algorithm andlibmtmalloc does 4X better than libumem for that algorithm. I alsofound that the OpenMP scheduling algorithm used made a hugedifference. Using static scheduling (which worked great on Solaris10) causes a performance problem on Illumos with 20 cores (40 threads)because it demands that all threads process their data, but 'guided'scheduling (which is adaptive and thus less sensitive to latencies)works better.

The full run reveals that on average, performance is better usinglibumem.

I found that one algorithm achieves no speed-up at all with umem, butachieves a speed-up of 10.57X when using mtmalloc. There is low CPUuse when there is no speed-up, which seems to rule out CPU-levelcontention such as cache-line thrashing.

The sensitive "canary" algorithms all access small bits of allocateddata (e.g. 32 bytes) at a time, and there are a couple of locksinvolved as well.

It feels like there is some sort of priority inversion or schedulingissue going on which is sensitive to the memory allocator used.

It is doubtful that the GNU GCC developers spend much effort withtuning pthreads-based OpenMP under Illumos or Solaris.


Bob
--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
Public Key,     http://www.simplesystems.org/users/bfriesen/public-key.txt

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T30dd2eceb8a069b3-M2ad789cd4956ec29857b4b82
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Re: [discuss] libc/libumem lack of concurrency

Reply via email to