On Mon, 18 Mar 2019, Robert Mustacchi wrote:
Hi Bob,

Thanks for digging into this. It's useful to have other takes on this. I
see these all as reasons that we should look at improving libumem. So a
few notes:

1) libumem isn't the default allocator, but a number of things link
against it so it can easily end up being pulled in. The default libc
allocator is even worse in a multi-threaded environment.

2) Right now when the alignment is a bit larger via
memalign/posix_memalign, we end up bypassing the traditional umem
caches, which can make that end up performing poorer.

My objective with the buffer alignment request is to avoid cache-line thrashing, and also to provide an opportunity for SSE2 type code to work. The allocation size is also rounded up to the cache line size. Linux malloc already appears to provide the desired alignment by default but Solaris malloc has been observed to be more space efficient, leading to unexpected and unpredictable cache-line thrashing.

I do not need to use posix_memalign() since I have a good work-around based on malloc()/free().

3) Based on your analysis of the different algorithms in use, would it
be possible to synthesize a bit more of a microbenchmark that describes
the various allocation and free patterns that are going on? That might
help us understand that a little bit better and see where we can
generally improve things.

I am still working to understand the issue myself. There are really very few allocations going on over a ten second run, but sometimes there could be 40 allocation requests (e.g. due to a 40-thread system) arriving at the same time, each due to a different thread.

I used a tool called 'plockstat' to look at locking and for the problem cases, libumem is taking (by far) most of the time. Here are the high-runner cases until finally reaching a lock in my own application, a lock in the dynamic loader, and a lock used by GCC's gomp:


Mutex block

-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
 1906 14758238 0xa46030                     libumem.so.1`vmem_xalloc+0xfc

      nsec ---- Time Distribution --- count Stack
      4096 |@                       |   121 libc.so.1`mutex_lock_impl+0x189
      8192 |@@                      |   182 libc.so.1`mutex_lock+0x13
     16384 |                        |     9 libumem.so.1`vmem_xalloc+0xfc
     32768 |@                       |   118 libumem.so.1`memalign+0xb0
     65536 |@                       |    88 libc.so.1`posix_memalign+0x41
    131072 |                        |    59 gm`MagickMallocAligned+0x38
    262144 |@                       |    96 gm`AllocateCacheNexus+0x13
    524288 |@                       |   123 gm`AcquireCacheNexus+0x137
   1048576 |@@                      |   168 gm`AcquireCacheViewPixels+0x6c
   2097152 |@@                      |   195 gm`InterpolateViewColor+0x42
   4194304 |@                       |   143 gm`WaveImage._omp_fn.4+0x166
   8388608 |@                       |   151
  16777216 |@@                      |   162
  33554432 |@                       |   107
  67108864 |@                       |   109
 134217728 |                        |    61
268435456 | | 14 -------------------------------------------------------------------------------
Count     nsec Lock                         Caller
 1587 16720081 0xa46030                     libumem.so.1`vmem_xfree+0x3e

      nsec ---- Time Distribution --- count Stack
      2048 |                        |     2 libc.so.1`mutex_lock_impl+0x189
      4096 |@                       |    84 libc.so.1`mutex_lock+0x13
      8192 |@@                      |   155 libumem.so.1`vmem_xfree+0x3e
     16384 |                        |    11 libumem.so.1`process_free+0x122
     32768 |@                       |   104 libumem.so.1`umem_malloc_free+0x1d
     65536 |@                       |    83 gm`AcquireCacheNexus+0x2fa
    131072 |                        |    63 gm`AcquireCacheViewPixels+0x6c
    262144 |@                       |    70 gm`InterpolateViewColor+0x42
    524288 |@                       |    86 gm`WaveImage._omp_fn.4+0x166
   1048576 |@@                      |   153 
libgomp.so.1.0.0`gomp_thread_start+0x18d
   2097152 |@@                      |   141 libc.so.1`_thrp_setup+0x8a
   4194304 |@                       |   118
   8388608 |@                       |   122
  16777216 |@                       |   131
  33554432 |@                       |    96
  67108864 |@                       |    90
 134217728 |                        |    63
 268435456 |                        |    13
536870912 | | 2 -------------------------------------------------------------------------------
Count     nsec Lock                         Caller
   98 13914404 0xa46030                     libumem.so.1`vmem_xfree+0x3e

      nsec ---- Time Distribution --- count Stack
      4096 |@                       |     5 libc.so.1`mutex_lock_impl+0x189
      8192 |@@@                     |    14 libc.so.1`mutex_lock+0x13
     16384 |                        |     2 libumem.so.1`vmem_xfree+0x3e
     32768 |@@                      |     9 libumem.so.1`process_free+0x122
     65536 |@                       |     6 libumem.so.1`umem_malloc_free+0x1d
    131072 |                        |     3 gm`AcquireCacheNexus+0x2fa
    262144 |                        |     3 gm`AcquireCacheViewPixels+0x6c
    524288 |                        |     3 gm`InterpolateViewColor+0x42
   1048576 |@@                      |     9 gm`WaveImage._omp_fn.4+0x166
   2097152 |@                       |     6 libgomp.so.1.0.0`GOMP_parallel+0x40
   4194304 |@                       |     7 gm`WaveImage+0x185
   8388608 |@                       |     8
  16777216 |@                       |     6
  33554432 |@                       |     6
  67108864 |@                       |     8
134217728 | | 3 -------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    9  3881187 0xa46030                     libumem.so.1`vmem_xalloc+0xfc

      nsec ---- Time Distribution --- count Stack
     16384 |@@@@@                   |     2 libc.so.1`mutex_lock_impl+0x189
     32768 |@@                      |     1 libc.so.1`mutex_lock+0x13
     65536 |@@@@@                   |     2 libumem.so.1`vmem_xalloc+0xfc
    131072 |@@                      |     1 libumem.so.1`memalign+0xb0
    262144 |                        |     0 libc.so.1`posix_memalign+0x41
    524288 |                        |     0 gm`MagickMallocAligned+0x38
   1048576 |@@                      |     1 gm`SetNexus+0x55d
   2097152 |                        |     0 gm`AcquireCacheNexus+0xd1
   4194304 |                        |     0 gm`AcquireCacheViewPixels+0x6c
   8388608 |                        |     0 gm`InterpolateViewColor+0x42
  16777216 |@@@@@                   |     2 gm`WaveImage._omp_fn.4+0x166
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    9   430535 libc.so.1`_uberdata+0x2a20   libc.so.1`_lwp_start

      nsec ---- Time Distribution --- count Stack
      8192 |@@                      |     1 libc.so.1`lmutex_lock+0xf8
     16384 |                        |     0 libc.so.1`tls_setup+0x72
     32768 |@@@@@                   |     2 libc.so.1`_thrp_setup+0x55
     65536 |                        |     0 libc.so.1`_lwp_start
    131072 |@@                      |     1
    262144 |                        |     0
    524288 |@@@@@@@@                |     3
1048576 |@@@@@ | 2 -------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    4   296960 0xb351c0                     gm`LockSemaphoreInfo+0x3d

      nsec ---- Time Distribution --- count Stack
      8192 |@@@@@@                  |     1 libc.so.1`mutex_lock_impl+0x189
     16384 |                        |     0 libc.so.1`mutex_lock+0x13
     32768 |                        |     0 gm`LockSemaphoreInfo+0x3d
     65536 |                        |     0 gm`ModifyCache+0x56
    131072 |@@@@@@                  |     1 gm`SetCacheNexus+0x5c
    262144 |                        |     0 gm`SetCacheViewPixels+0x6c
    524288 |@@@@@@@@@@@@            |     2 
gm`ConstituteTextureImage._omp_fn.0+0xbf
                                            
libgomp.so.1.0.0`gomp_thread_start+0x18d
                                            libc.so.1`_thrp_setup+0x8a
                                            libc.so.1`_lwp_start
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    1  1048576 0xa46030                     libumem.so.1`vmem_xalloc+0x41d

      nsec ---- Time Distribution --- count Stack
   1048576 |@@@@@@@@@@@@@@@@@@@@@@@@|     1 libc.so.1`mutex_lock_impl+0x189
                                            libc.so.1`mutex_lock+0x13
                                            libumem.so.1`vmem_xalloc+0x41d
                                            libumem.so.1`memalign+0xb0
                                            libc.so.1`posix_memalign+0x41
                                            gm`MagickMallocAligned+0x38
                                            gm`SetNexus+0x55d
                                            gm`AcquireCacheNexus+0xd1
                                            gm`AcquireCacheViewPixels+0x6c
                                            gm`InterpolateViewColor+0x42
                                            gm`WaveImage._omp_fn.4+0x166
-------------------------------------------------------------------------------
Count     nsec Lock                         Caller
    4   114688 libc.so.1`_uberdata+0x2a20   
libgomp.so.1.0.0`gomp_thread_start+0x24

      nsec ---- Time Distribution --- count Stack
     32768 |@@@@@@@@@@@@            |     2 libc.so.1`lmutex_lock+0xf8
     65536 |                        |     0 libc.so.1`slow_tls_get_addr+0x49
    131072 |@@@@@@                  |     1 
libgomp.so.1.0.0`gomp_thread_start+0x24
    262144 |@@@@@@                  |     1 libc.so.1`_thrp_setup+0x8a
                                            libc.so.1`_lwp_start

--
Bob Friesenhahn
[email protected], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
Public Key,     http://www.simplesystems.org/users/bfriesen/public-key.txt

------------------------------------------
illumos: illumos-discuss
Permalink: 
https://illumos.topicbox.com/groups/discuss/T30dd2eceb8a069b3-M4bee0e5296e27efb49ba1aaf
Delivery options: https://illumos.topicbox.com/groups/discuss/subscription

Reply via email to