Hi Frank, * Frank Ch. Eigler ([email protected]) wrote: > > Julien Desfossez <[email protected]> writes: > > > LTTng-UST vs SystemTap userspace tracing benchmarks > > Thank you. > > > [...] For flight recorder tracing, UST is 289 times faster than > > SystemTap on an 8-core system with a LTTng kernel and 279 times with > > a vanilla+utrace kernel. > > This is not that surprising, considering how the two tools work. UST > does its work in userspace,
This first part of the statement is true, > and is therefore focused on an individual > process's activities. This is incorrect. LTTng and UST gather traces from multiple processes and from the kernel, and merge them in post-processing. This toolset is therefore focused on system-wide activity analysis. > Systemtap does its work in kernelspace, and can > therefore focus on many different processes and the kernel at the same > time. This entails some ring transitions. The difference between UST and SystemTAP is not the target goal, but rather where the computation is done: UST uses buffering to send its trace output, conversely, SystemTAP performs the ring transition for each individual event. This is a core design difference that partly explains the dramatically performance results we see here. > > (One may imagine a future version of systemtap where scripts that > happen to independently probe single processes are executed with a > pure userspace backend, but this is not in our immediate roadmap.) > > > SystemTap does not scale for multithreaded applications running on > > multi-core systems. [...] > > We know of at least one kernel problem in this area, > <http://sourceware.org/PR5660>, which may be fixable via core or > utrace or uprobes changes. > > > > This study proves that LTTng-UST and SystemTap are two tools with a > > complementary purpose. [...] > > Strictly speaking, it shows that their performance differs > dramatically in this sort of microbenchmark. Strictly speaking, you are right. I've done performance testing on LTTng (the kernel equivalent of UST, using very similar technology) on real workloads traced at the kernel level, and this kind of microbenchmark actually shows a lower-bound of the tracer performance impact per probe (the upper-bound being up to a factor 3 higher due to cache misses in the trace buffers). All the details are presented in http://lttng.org/pub/thesis/desnoyers-dissertation-2009-12.pdf, Chapters 5.5, 8.4 and 8.5. Now the overall performance impact must indeed be weighted by the number of times the tracer is called by the application. If, for example, we trace standard tests like "dbench" at the kernel-level with LTTng, we get a 3% performance hit. If we multiply this by 294, this gets in the area of a 882% performance hit on the system, which is likely to have some noticeable impact on the end user experience. > > Thank you for your data gathering. Thanks for your reply. We'll be glad to help out if we can. Mathieu > > - FChE -- Mathieu Desnoyers Operating System Efficiency R&D Consultant EfficiOS Inc. http://www.efficios.com _______________________________________________ ltt-dev mailing list [email protected] http://lists.casi.polymtl.ca/cgi-bin/mailman/listinfo/ltt-dev
