Hi Jerin,

Any idea why lttng performance is so poor?
I would have naturally gone there to benefit from the existing toolchain.

Have you looked at the FD.io logging/tracing infrastructure for inspiration?

Ray K

On 13/01/2020 10:40, Jerin Jacob Kollanukkaran wrote:
> Hi All,
> I would like to add tracing support for DPDK.
> I am planning to add this support in v20.05 release.
> This RFC attempts to get feedback from the community on
> a) Tracing Use cases.
> b) Tracing Requirements.
> b) Implementation choices.
> c) Trace format.
> Use-cases
> ---------
> - Most of the cases, The DPDK provider will not have access to the DPDK 
> customer applications.
> To debug/analyze the slow path and fast path DPDK API usage from the field,
> we need to have integrated trace support in DPDK.
> - Need a low overhead Fast path multi-core PMD driver debugging/analysis
> infrastructure in DPDK to fix the functional and performance issue(s) of PMD.
> - Post trace analysis tools can provide various status across the system such
> as cpu_idle() using the timestamp added in the trace.
> Requirements:
> -------------
> - Support for Linux, FreeBSD and Windows OS
> - Open trace format
> - Multi-platform Open source trace viewer
> - Absolute low overhead trace API for DPDK fast path tracing/debugging.
> - Dynamic enable/disable of trace events
> To enable trace support in DPDK, following items need to work out: 
> a) Add the DPDK trace points in the DPDK source code.
> - This includes updating DPDK functions such as,
> rte_eth_dev_configure(), rte_eth_dev_start(), rte_eth_dev_rx_burst() to emit 
> the trace.
> b) Choosing suitable serialization-format
> - Common Trace Format, CTF, is an open format and language to describe trace 
> formats.
> This enables tool reuse, of which line-textual (babeltrace) and 
> graphical (TraceCompass) variants already exist.
> CTF should look familiar to C programmers but adds stronger typing. 
> See CTF - A Flexible, High-performance Binary Trace Format.
> https://diamon.org/ctf/
> c) Writing the on-target serialization code,
> See the section below.(Lttng CTF trace emitter vs DPDK specific CTF trace 
> emitter)
> d) Deciding on and writing the I/O transport mechanics,
> For performance reasons, it should be backed by a huge-page and write to file 
> IO.
> e) Writing the PC-side deserializer/parser,
> Both the babletrace(CLI tool) and Trace Compass(GUI tool) support CTF.
> See: 
> https://lttng.org/viewers/
> f) Writing tools for filtering and presentation.
> See item (e)
> Lttng CTF trace emitter vs DPDK specific CTF trace emitter
> ----------------------------------------------------------
> I have written a performance evaluation application to measure the overhead
> of Lttng CTF emitter(The fastpath infrastructure used by https://lttng.org/ 
> library to emit the trace)
> https://github.com/jerinjacobk/lttng-overhead
> https://github.com/jerinjacobk/lttng-overhead/blob/master/README
> I could improve the performance by 30% by adding the "DPDK"
> based plugin for get_clock() and get_cpu(),
> Here are the performance numbers after adding the plugin on 
> x86 and various arm64 board that I have access to,
> On high-end x86, it comes around 236 cycles/~100ns @ 2.4GHz (See the last 
> line in the log(ZERO_ARG)) 
> On arm64, it varies from 312 cycles to 1100 cycles(based on the class of CPU).
> In short, Based on the "IPC capabilities", The cost would be around 100ns to 
> 400ns
> for single void trace(a trace without any argument)
> [lttng-overhead-x86] $ sudo ./calibrate/build/app/calibrate -c 0xc0
> make[1]: Entering directory '/export/lttng-overhead-x86/calibrate'
> make[1]: Leaving directory '/export/lttng-overhead-x86/calibrate'
> EAL: Detected 56 lcore(s)
> EAL: Detected 2 NUMA nodes
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'PA'
> EAL: Probing VFIO support...
> EAL: PCI device 0000:01:00.0 on NUMA socket 0
> EAL:   probe driver: 8086:1521 net_e1000_igb
> EAL: PCI device 0000:01:00.1 on NUMA socket 0
> EAL:   probe driver: 8086:1521 net_e1000_igb
> CPU Timer freq is 2600.000000MHz
> NOP: cycles=0.194834 ns=0.074936
> GET_CLOCK: cycles=47.854658 ns=18.405638
> GET_CPU: cycles=30.995892 ns=11.921497
> ZERO_ARG: cycles=236.945113 ns=91.132736
> We will have only 16.75ns to process 59.2 mpps(40Gbps), So IMO, Lttng CTF 
> emitter
> may not fit the DPDK fast path purpose due to the cost associated with 
> generic Lttng features.
> One option could be to have, native CTF emitter in EAL/DPDK to emit the
> trace in a hugepage. I think it would be a handful of cycles if we limit the 
> features
> to the requirements above:
> The upside of using Lttng CTF emitter:
> a) No need to write a new CTF trace emitter(the item (c))
> The downside of Lttng CTF emitter(the item (c))
> a) performance issue(See above)
> b) Lack of Windows OS support. It looks like, it has basic FreeBSD support.
> c) dpdk library dependency to lttng for trace.
> So, Probably it good to have native CTF emitter in DPDK and reuse all
> open-source trace viewer(babeltrace and  TraceCompass) and format(CTF) 
> infrastructure.
> I think, it would be best of both world.
> Any thoughts on this subject? Based on the community feedback, I can work on 
> the patch for v20.05.

Reply via email to