----- On Sep 7, 2017, at 1:06 PM, Milian Wolff [email protected] wrote:
> On Thursday, September 7, 2017 6:29:58 PM CEST Jérémie Galarneau wrote: >> On 7 September 2017 at 04:47, Milian Wolff <[email protected]> wrote: >> > Hey all, >> > >> > after my latest report over liblttng-ust overhead yielded such quick >> > results, I would like to expand on this with a more realistic workload >> > that I'm looking at. >> > >> > We have two applications, one that sets up an SQLite database and then >> > launches multiple client applications that try to read and write >> > concurrently from that database. Of course, this will show file-level >> > lock contention. I'd like to use LTTng to visualize the effect. >> > >> > On my machine I get the following numbers. >> > >> > 1) Without LTTng and without UST: >> > ~~~~~ >> > >> > Performance counter stats for './launcher/lab_latency_launcher': >> > 495.075367 task-clock (msec) # 0.150 CPUs utilized >> > >> > 2,340 context-switches # 0.005 M/sec >> > >> > 291 cpu-migrations # 0.588 K/sec >> > >> > 52,265 page-faults # 0.106 M/sec >> > >> > 3,098,760,038 cycles # 6.259 GHz >> > 1,894,074,743 instructions # 0.61 insn per >> > cycle >> > >> > 354,869,636 branches # 716.799 M/sec >> > >> > 10,317,266 branch-misses # 2.91% of all >> > branches >> > >> > 3.298337574 seconds time elapsed >> > >> > ~~~~~~ >> > >> > 2) With LTTng (all kernel events enabled) and without UST: >> > ~~~~~ >> > >> > Performance counter stats for 'run_lttng_trace.sh ./launcher/ >> > >> > lab_latency_launcher': >> > 296.329694 task-clock (msec) # 0.068 CPUs utilized >> > >> > 1,808 context-switches # 0.006 M/sec >> > >> > 398 cpu-migrations # 0.001 M/sec >> > >> > 53,493 page-faults # 0.181 M/sec >> > >> > 2,411,871,165 cycles # 8.139 GHz >> > 2,370,005,487 instructions # 0.98 insn per >> > cycle >> > >> > 425,284,747 branches # 1435.174 M/sec >> > >> > 10,442,662 branch-misses # 2.46% of all >> > branches >> > >> > 4.338938821 seconds time elapsed >> > >> > ~~~~~~ >> > >> > 3) With LTTng (all kernel events enabled) and with UST `tracef()` events >> >> > enabled (~400 UST events in total): >> We need more data to make sense of those numbers. >> >> What does the payload of those events look like? Using tracef(), >> string formating will most likely dominate over the time taken to >> write to the bufers. > > I attached the complete source code for you to reproduce this yourself. Also, > string formatting _does not_ dominate the time, otherwise I'd see more CPU > utilization (cf. perf stat output's first line). > >> How many clients are you launching? What is their expected lifetime? > > 100 clients, 10 in parallel, each taking a couple of milliseconds. See the > source code for yourself. I've tried your program here, but cannot reproduce the overhead you describe. What am I doing wrong ? (fyi, I use master branches of lttng-ust, modules, tools, liburcu). This is on a Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz, Linux 4.9.0-3-amd64 4.9.30-2+deb9u2 (debian). * build-no-ust: time ./launcher/launcher > /dev/null real 0m1.752s user 0m0.264s sys 0m0.172s * build-no-ust (with kernel syscalls traced): [ using lttng enable-event -k --syscall -a ] time ./launcher/launcher > /dev/null real 0m1.700s user 0m0.212s sys 0m0.236s * build-no-ust (with kernel tracepoints traced): [ using lttng enable-event -k --tracepoint -a ] time ./launcher/launcher > /dev/null real 0m2.031s user 0m0.248s sys 0m0.332s -> this overhead is expected given that we are tracing lots of kernel events. Selecting specific events instead would help making this overhead smaller. * build-no-ust (with all kernel events (tracepoints+syscalls)): [ using lttng enable-event -k -a ] time ./launcher/launcher > /dev/null real 0m2.031s user 0m0.256s sys 0m0.332s * build-with-ust (with tracef UST event) [ using lttng enable-event -u lttng_ust_tracef:event ] time ./launcher/launcher > /dev/null real 0m1.525s user 0m0.432s sys 0m0.360s lttng view | wc -l 400 -> 400 events collected. * build-with-ust (with tracef UST event and all kernel events) [ using: lttng enable-event -u lttng_ust_tracef:event and lttng enable-event -k -a ] time ./launcher/launcher > /dev/null real 0m2.428s user 0m0.460s sys 0m0.892s The main difference I see between your setup and mine is that I use "time" instead of perf to run it. Also, I run lttng-sessiond as root, is it your case as well ? Also, please note that lttng-modules exposes a /proc/lttng-logger file, into which you can write from user-space. It ends up in the kernel trace. This could do the trick if for some reason lttng-ust does not work for your scenario. However, the lttng-logger proc file is just there as a convenience for scripts, and is by no mean a "fast" tracing interface. Thanks, Mathieu > > Thanks > > -- > Milian Wolff | [email protected] | Senior Software Engineer > KDAB (Deutschland) GmbH&Co KG, a KDAB Group company > Tel: +49-30-521325470 > KDAB - The Qt Experts > _______________________________________________ > lttng-dev mailing list > [email protected] > https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
