On Mon, Oct 20, 2014 at 7:48 PM, Boisvert, Sebastien <[email protected]> wrote:
> > From: Eugene Ivanov [[email protected]] > > Sent: Monday, October 20, 2014 6:01 AM > > To: Boisvert, Sebastien > > Cc: [email protected] > > Subject: Re: [lttng-dev] Best way to analyze CTF files > > > > > > > > Hi Sebastian, > > *Sebastien* or *Sébastien* > Oops, I'm sorry, Sebastien. > > > > In our company one of the main lttng applications is to measure latency > between two probes which get same ID as argument. > > We do the same thing with LTTng (our ID is called message_number, and > every actor message carries one): > > > For this purpose we've implemented in C a special utility which is based > on libraries to read CTF (IIRC). Also some developers often > > use own simple perl scripts to analyze babeltrace output. > > In our system, messages are sent by actors (like Erlang processes, but in > C 1999 for throughput). > > We ended up using LTTng's powerful real-time filtering engine to filter > only messages > for a single actor: > > lttng enable-event --userspace thorium_message:* --filter > "message_source_actor == 1001563" > > > Then we are simply just looking at individual messages (LTTng provides us > with the delivery path of > every actor message for the selected actor): > > [boisvert@bigmem biosal]$ cat trace.txt | grep "message_number = 99,"|awk > '{print $4" "$2}' > thorium_message:actor_send: (+0.003747706) > thorium_message:worker_send: (+0.000001442) > thorium_message:worker_enqueue_message: (+0.000001257) > thorium_message:worker_publish_message: (+0.000000976) > thorium_message:worker_pool_dequeue: (+0.003363601) > thorium_message:node_send: (+0.000001483) > thorium_message:node_send_system: (+0.000001349) > thorium_message:transport_send: (+0.000001040) > thorium_message:node_receive: (+0.000073889) > thorium_message:node_dispatch_message: (+0.000001891) > thorium_message:worker_pool_enqueue: (+0.000001696) > thorium_message:worker_receive: (+0.000001774) > thorium_message:actor_receive: (+0.000002841) > > > > I wish there is a better way to implement custom CTF analyzers, probably > some basic DSL. > > I think that an approach that would be efficient would be to just import > everything in > sqlite. Otherwise, I think that filtering is good since it does generates > files that are too large. > > Without filtering, our HPC app (supercomputing, not cloud) generates ~30 > GiB worth of > LTTng CTF data in 2 minutes with only 4 Thorium nodes (~ 30 x86-64 cores). > In production, we are using > ~6000 x86 cores on a Cray XE6 for ~20 minutes and ~30000 POWER cores (A2) > for ~ 20 minutes. > So LTTng's builtin filter is really our only option IMO. > I see. > > > > > > > > On Sat, Oct 18, 2014 at 2:23 AM, Boisvert, Sebastien > > <[email protected]> wrote: > > > > Bonjour, > > > > First, thank you for LTTng-UST. This is very useful and convenient. > > > > I just got started today using LTTng (LTTng-UST) for tracing a HPC > application > > that I am working on (I am a postdoc). I am impressed by how easy LTTng > is to use it. > > > > In my system, an actor message is represented by a pair > > <message_actor_source, message_number>. > > > > I want to list all messages that have a high delivery time > (message:actor_receive - message:actor_send). > > > > I am doing this to get the messages of one actor (actor 1000019): > > > > [boisvert@bigmem biosal]$ babeltrace > ~/lttng-traces/auto-20141017-181240|grep "message_source_actor = 1000019" > > actor_1000019 > > > > Then, I can look at one message with (message <1000019, 14>): > > > > [boisvert@bigmem biosal]$ grep "message_number = 14," actor_1000019 > > [18:12:43.647017211] (+0.000005110) > > bigmem.knoxville.kbase.us message:actor_send: { cpu_id = 30 }, { > message_number = 14, message_action = 31592, message_count = 8, > message_source_actor = 1000019, message_destination_actor = 1000059, > message_source_node = -1, message_destination_node = -1 > > } > > [18:12:43.647025249] (+0.000002860) > > bigmem.knoxville.kbase.us message:actor_receive: { cpu_id = 49 }, { > message_number = 14, message_action = 31592, message_count = 8, > message_source_actor = 1000019, message_destination_actor = 1000059, > message_source_node = 3, message_destination_node = > > 3 } > > > > If I substract the times: > > > > irb(main):003:0> (43.647025249-43.647017211)*10**9 > > => 8038.00000426236 > > > > This message (<1000019, 14>) required 8038 ns for the delivery. This one > is fine. > > > > > > So basically my question is: > > > > Is there an easy way to analyze these tracepoint files ? > > _______________________________________________ > > lttng-dev mailing list > > [email protected] > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > > > > > > -- > > Eugene > > > > > -- Eugene
_______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
