Here is some additional data showing how the performance scales on
larger binaries. I used the exact same set up as I described in the
previous email, only replacing the 64M stap binary with firefox's 1G
libxul.so.debug:

$ hyperfine --runs 5 -i --warmup 2 'eu-readelf -C1 -N -w libxul.so.debug'

thread safety enabled, this patch applied (__atomic builtins, lazy loading)
  Time (mean ± σ):     24.117 s ±  0.074 s    [User: 23.836 s, System: 0.234 s]
  Range (min … max):   24.041 s … 24.210 s    5 runs

thread safety enabled, v1 patch applied (eager loading)
  Time (mean ± σ):     24.436 s ±  0.185 s    [User: 24.143 s, System: 0.245 s]
  Range (min … max):   24.207 s … 24.632 s    5 runs

thread safety enabled, main branch (rwlock, lazy loading)
  Time (mean ± σ):     25.179 s ±  0.154 s    [User: 24.904 s, System: 0.226 s]
  Range (min … max):   25.020 s … 25.384 s    5 runs

thread safety disabled, main branch (lazy loading)
  Time (mean ± σ):     23.957 s ±  0.124 s    [User: 23.681 s, System: 0.230 s]
  Range (min … max):   23.769 s … 24.095 s    5 runs

The results with libxul.so.debug are consistent with what I reported
for the stap binary. With this patch applied, `eu-readelf -N -w` with
thread safety enabled is just 0.7% slower compared to `eu-readelf -N
-w` with thread safety disabled.  Patch v1 with eager abbrev loading
is 1.3% slower than this patch. And the existing main branch thread
safe implementation is 4.4% slower than this patch.

Aaron

Reply via email to