> From somewhere, I saw CPU cycles % with LTTng is even 1-2% lower than vanilla.
You might be referring to section 5.3 of [1], last paragraph. [1] https://www.dorsal.polymtl.ca/en/system/files/desnoyers.pdf In any cases, there is always an overhead when actively tracing (recording to disk, network or even in flight recorder mode). > Is it contributed by dynamic branch prediction or other instruction level > optimization? Note that these numbers were for a 2.6.30 kernel. Things might have changed a bit since. As far as I know, the performance of an instrumented kernel is independent from lttng since instrumentation of the kernel is a not only used by lttng but by multiple projects. But it might help system engineers to accept to turn on instrumentation for their kernel. Anyhow, the speedup was attributed to the modification to the instruction and data cache layout. >In that way, the kernel or at least key libraries like libc is > kind of re-compiled, right? lttng-modules does not require any kernel recompilation. It uses kernel modules to function. Instrumenting libc directly might not ideal due to its nature. You can use LD_PRELOAD and lttng-ust to shim and instrument libc function you are interested in. For example, we already ship a shim for some libc functions [1] that you can LD_PRELOAD. [1] https://github.com/lttng/lttng-ust/tree/9d4d2a639afc19a1bd705ea560782917ac892596/liblttng-ust-libc-wrapper Hope this helps. Cheers -- Jonathan Rajotte-Julien EfficiOS _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev