Hi, I'm finishing my master and I want to share my results on what can be done using STM, a hardware tracer present on newer ARM-based platform. It really speeds up tracing, but requires a lot of changes if we want LTTng to support it with something like:
$ lttng use-hardware stm Most debugging hardware units provide execution tracing (i.e. recording every branch during execution), which is not adapted to event tracing. However, some other hardware solutions exist to accelerate event recording. They provide dedicated resources for writing and timestamping events. Currently, two hardware modules provide such functionality: - "Data Acquisition Messages" on Freescale QorIQ processors; - "System Trace Module" on some ARM-based chips (called "System Trace Macrocell" in new versions). Since Data Acquisition Messages are not publicly documented, the following only concerns STM. Test we made on a Pandaboard showed that using STM to record small UST tracepoints is 10x faster than using LTTng-UST. Also, STM has many independent "channels" so that in most situations, the use of locks isn't needed. However, the characteristics of this hardware makes special handling necessary. Particularly, the following are needed: - a new way of writing trace *sequentially*; - a new type of trace consumer (for dealing with hardware buffers); - a new trace converter (STM format -> CTF). About sequential trace ---------------------- Writing to STM must be sequential: to store a 4-integer array, you need to write 4 times to the same address. Thus, the classic "channel0_%d" files cannot be written to STM, first because they are not written sequentially: the header is updated once a full page has been written. The other reason is that theses files are full of padding and spaces, which would be really inefficient in STM because it means writing each of these zeros (and not just moving a pointer as in libringbuffer). Another STM capability is to automatically timestamp messages using a hardware clock, so LTTng software timestamping would not be needed when using hardware. I don't know how much speedup can be expected. Since STM clock and LTTng clock (RDTSC) are different, an artefact is needed to synchronize (for instance, sending a signal packet containing RDTSC value into STM every second). About trace consuming --------------------- STM output is stored in a special buffer called ETB. This buffer can either be read from the host system, or remotely drained from another computer attached with JTAG. Both cases have users: some want to export trace via JTAG to reduce overhead, some don't have a JTAG connector. So both cases should be supported. My way of doing things would be a stand-alone trace consumer software, not part of the LTTng consumers. This way, both on-chip and off-chip use-cases would result in a raw trace file, either created on the traced host or retrieved by the other monitoring machine. About STM trace format ---------------------- Then, a trace converted would produce CTF from this raw trace file. The STM format (called STP) is too strange to be described as CTF. Also, processing STP is quite CPU-intensive, so real-time decoding is not a good idea as it would be a burden on performance. Decoding at analysis time should be preferred. The STP format is not publicly documented, but I wrote code to decode it [1]. It could be reused as a base for a decoder, for example a "stp2ctf" babeltrace plugin. Comments are welcome to answer these two questions: 1. Is it worth supporting STM in LTTng? 2. How to define a new sequential trace format that is further-compatible with future event-tracing hardware? Thanks, Adrien Vergé [1]: https://github.com/adrienverge/libcoresightomap4430 _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
