On Tuesday, September 5, 2017 4:51:42 PM CEST Mathieu Desnoyers wrote: > ----- On Sep 5, 2017, at 10:34 AM, Milian Wolff [email protected] wrote: > > Hey all, > > > > I have noticed a very large overhead when linking against liblttng-ust: > > > > ~~~~~ > > ┌milian@milian-kdab2:/tmp > > └$ cat lttng-test.c > > int main() > > { > > > > return 0; > > > > } > > ┌milian@milian-kdab2:/tmp > > └$ gcc -O2 -g -ldl lttng-test.c > > ┌milian@milian-kdab2:/tmp > > └$ perf stat -r 5 ./a.out > > > > Performance counter stats for './a.out' (5 runs): > > 0.209587 task-clock (msec) # 0.596 CPUs utilized > > > > ( +- 8.76% ) > > > > 0 context-switches # 0.000 K/sec > > 0 cpu-migrations # 0.000 K/sec > > > > 49 page-faults # 0.235 M/sec > > > > ( +- 1.19% ) > > > > 706,854 cycles # 3.373 GHz > > > > ( +- 8.82% ) > > > > 773,603 instructions # 1.09 insn per cycle > > > > ( +- 0.75% ) > > > > 147,128 branches # 701.987 M/sec > > > > ( +- 0.70% ) > > > > 4,096 branch-misses # 2.78% of all > > branches > > > > ( +- 5.27% ) > > > > 0.000351422 seconds time elapsed > > > > ( +- 11.85% ) > > > > ┌milian@milian-kdab2:/tmp > > └$ gcc -O2 -g -ldl -llttng-ust lttng-test.c > > ┌milian@milian-kdab2:/tmp > > └$ perf stat -r 5 ./a.out > > > > Performance counter stats for './a.out' (5 runs): > > 2.063040 task-clock (msec) # 0.009 CPUs utilized > > > > ( +- 1.37% ) > > > > 44 context-switches # 0.021 M/sec > > > > ( +- 1.95% ) > > > > 2 cpu-migrations # 0.776 K/sec > > > > ( +- 25.00% ) > > > > 209 page-faults # 0.101 M/sec > > > > ( +- 0.34% ) > > > > 7,053,686 cycles # 3.419 GHz > > > > ( +- 2.03% ) > > > > 6,893,783 instructions # 0.98 insn per cycle > > > > ( +- 0.25% ) > > > > 1,342,492 branches # 650.735 M/sec > > > > ( +- 0.20% ) > > > > 29,390 branch-misses # 2.19% of all > > branches > > > > ( +- 0.61% ) > > > > 0.225597302 seconds time elapsed > > > > ( +- 6.68% ) > > ~~~~~ > > > > This is without any LTTng session configured. If I enable LTTng kernel and > > userspace events, this becomes even worse: > > > > ~~~~~ > > ┌milian@milian-kdab2:/tmp > > └$ cat $(which run_lttng_trace.sh) > > #!/bin/sh > > > > if [ -z "$(pidof lttng-sessiond)" ]; then > > > > sudo lttng-sessiond --daemonize > > > > fi > > > > sudo lttng create -o ~/lttng-traces/$(date -Iseconds) > > sudo lttng enable-channel kernel -k --subbuf-size 16M --num-subbuf 8 > > sudo lttng enable-event -c kernel -k -a > > sudo lttng enable-channel ust -u --subbuf-size 16M --num-subbuf 8 > > sudo lttng enable-event -c ust -u lttng_ust_tracef:* > > sudo lttng start > > $@ > > sudo lttng stop > > > > sudo chmod a+rx -R ~/lttng-traces > > ┌milian@milian-kdab2:/tmp > > └$ run_lttng_trace.sh perf stat -r 5 ./a.out > > Session auto-20170905-162818 created. > > Traces will be written in > > /home/milian/lttng-traces/2017-09-05T16:28:18+02:00 Kernel channel kernel > > enabled for session auto-20170905-162818 > > All Kernel events are enabled in channel kernel > > UST channel ust enabled for session auto-20170905-162818 > > UST event lttng_ust_tracef:* created in channel ust > > Tracing started for session auto-20170905-162818 > > > > Performance counter stats for './a.out' (5 runs): > > 4.937820 task-clock (msec) # 0.005 CPUs utilized > > > > ( +- 1.28% ) > > > > 117 context-switches # 0.024 M/sec > > > > ( +- 0.70% ) > > > > 3 cpu-migrations # 0.608 K/sec > > > > ( +- 23.57% ) > > > > 245 page-faults # 0.050 M/sec > > > > ( +- 0.20% ) > > > > 16,541,355 cycles # 3.350 GHz > > > > ( +- 0.94% ) > > > > 20,611,637 instructions # 1.25 insn per cycle > > > > ( +- 0.23% ) > > > > 3,580,525 branches # 725.123 M/sec > > > > ( +- 0.19% ) > > > > 52,093 branch-misses # 1.45% of all > > branches > > > > ( +- 1.73% ) > > > > 0.965545882 seconds time elapsed > > > > ( +- 16.96% ) > > > > Waiting for data availability... > > Tracing stopped for session auto-20170905-162818 > > ~~~~~ > > > > I see this ~1s delay during startup in nearly all test applications where > > I > > wanted to use lttng-ust. In tracecompass, and according to perf sleep time > > profiling, this seems to be mostly due to extensive sleeping on mutex > > locks > > (from ust_lock in liblttng-ust). With perf trace, I also see membarrier > > taking a lot of time: > > > > ~~~~~ > > # without a running lttng session: > > ┌milian@milian-kdab2:/tmp > > └$ perf trace --duration 1 ./a.out > > > > 6.492 (52.468 ms): a.out/23672 recvmsg(fd: 3<socket:[1178439]>, msg: > > 0x7fbe2fbb1070 ) = 1 > > > > 5.077 (54.271 ms): a.out/23671 futex(uaddr: 0x7fbe30d508a0, op: > > WAIT_BITSET|PRIV|CLKRT, utime: 0x7ffc474ff5a0, val3: 4294967295) = 0 > > > > 59.598 (79.379 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 138.984 (39.945 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 59.621 (119.324 ms): a.out/23672 futex(uaddr: 0x7fbe303e54c0, op: WAIT| > > > > PRIV, val: 2 ) = 0 > > > > 179.045 (79.918 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 258.971 (39.997 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 299.052 (79.883 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 378.942 (59.988 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 439.022 (69.908 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > 508.937 (80.027 ms): a.out/23671 membarrier(cmd: 1 > > > > ) = 0 > > > > # with a running lttng session: > > ┌milian@milian-kdab2:/tmp > > └$ run_lttng_trace.sh perf trace --duration 1 ./a.out > > > > 6.666 (392.496 ms): a.out-ust/23790 recvmsg(fd: 3<socket:[1185362]>, msg: > > 0x7fcd81d8a0b0 ) = 612 > > > > 400.238 (68.452 ms): a.out-ust/23790 membarrier(cmd: 1 > > > > ) = 0 > > > > 468.698 (39.983 ms): a.out-ust/23790 membarrier(cmd: 1 > > > > ) = 0 > > > > 510.567 (98.113 ms): a.out-ust/23790 recvmsg(fd: 3<socket:[1185362]>, msg: > > 0x7fcd81d8a070 ) = 1 > > > > 6.467 (603.500 ms): a.out/23789 futex(uaddr: 0x7fcd82f298a0, op: > > WAIT_BITSET|PRIV|CLKRT, utime: 0x7fff4df6c970, val3: 4294967295) = 0 > > > > 610.020 ( 6.307 ms): a.out/23789 futex(uaddr: 0x7fcd82f29960, op: WAIT| > > > > PRIV, val: 2 ) = 0 > > > > 616.446 (62.265 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 678.721 (59.916 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 616.332 (122.335 ms): a.out-ust/23790 futex(uaddr: 0x7fcd82f29960, op: > > WAIT|PRIV, val: 2 ) = 0 > > > > 738.976 (49.658 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 788.642 (49.987 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 739.014 (99.644 ms): a.out-ust/23790 futex(uaddr: 0x7fcd825be4c0, op: > > WAIT| > > > > PRIV, val: 2 ) = 0 > > > > 838.651 (49.977 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 888.636 (69.983 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 838.662 (119.978 ms): a.out-ust/23790 futex(uaddr: 0x7fcd825be4c0, op: > > WAIT|PRIV, val: 2 ) = 0 > > > > 958.739 (59.873 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1018.621 (80.058 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1098.782 (49.844 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1148.634 (59.998 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1208.731 (69.940 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1278.680 (79.947 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1358.727 (39.881 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 1398.616 (39.980 ms): a.out/23789 membarrier(cmd: 1 > > > > ) = 0 > > > > 6.640 (1432.157 ms): a.out-ust/23791 futex(uaddr: 0x7fcd82f90000 > > > > ) = -1 (null) INTERNAL ERROR: strerror_r(512, 0x7ffdbc5467a0, 128)=22 > > ~~~~~ > > > > This makes lttng-ust pretty much unusable for any short-lived application. > > Is this a known limitation? > > This kind of delay is not completely unexpected, but a bit higher than I > would expect. I'm interested in figuring out if most of this delay happens > to be caused by sys_membarrier, either directly, or indirectly (through a > lock).
See my other email which adds some harder numbers from a perf sleep record. > > I'm using lttng-ust on Arch from the AUR packages, i.e. LTTng 2.9.4. I've > > seen this problem also on Ubuntu 17.04 though I didn't do any of the > > above measurements there. So far, I suspected the ust statedump to take > > too much time, but I disabled that one for the measurements above. > > > > Any help would be appreciated > > Can you reproduce with a Linux kernel that has CONFIG_MEMBARRIER=n and > share the performance results ? I'd have to recompile the kernel, which is something I don't like to do. Why is this required, or put differently - why would the kernel option influence the userspace LTTng library? Thanks -- Milian Wolff | [email protected] | Senior Software Engineer KDAB (Deutschland) GmbH&Co KG, a KDAB Group company Tel: +49-30-521325470 KDAB - The Qt Experts
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ lttng-dev mailing list [email protected] https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
