Hi Anders, Yes, please file a feature request against Babeltrace on the bug tracker pointing the this email thread and describing the feature.
Thanks! Mathieu ----- Original Message ----- > From: "Anders Wallin" <[email protected]> > To: "Mathieu Desnoyers" <[email protected]> > Cc: [email protected] > Sent: Monday, November 17, 2014 10:45:37 AM > Subject: Re: [lttng-dev] snapshot data corrupted after the output buffer is > filled > Thx Mathieu, > if I'm running my test on one cpu (taskset -c 1 ./test.run) it works as > expected and described by you. > The babeltrace flag would be a good solution! Do you want me to file a > bug/enhancement on this? > Regards > Anders Wallin > On Sat, Nov 15, 2014 at 6:59 PM, Mathieu Desnoyers < > [email protected] > wrote: > > > From: "Anders Wallin" < [email protected] > > > > > > > To: [email protected] > > > > > > Sent: Friday, November 14, 2014 8:28:18 AM > > > > > > Subject: [lttng-dev] snapshot data corrupted after the output buffer is > > > filled > > > > > > Hi, > > > > > > it looks like snapshot events became faulty after the output max size is > > > reached. > > > > > > The next snapshot get mixed events from different buffers. I have tested > > > it > > > on 2.3, 2.4 and 2.5 > > > > > > and it's in all of them, but it hits earlier in 2.5 due to the problem > > > reported in > > > > > > http://lists.lttng.org/pipermail/lttng-dev/2014-November/023773.html > > > > > > There should never be more then 2 HUNKS in a diff between 2 consecutive > > > snapshots, here I got > > > > > > 11 as the worst case. > > > > > Hi Anders, > > > This is not a bug, rather just per-cpu buffering at work.It does have some > > > rather non-obvious consequences in terms of snapshot event flow. > > > To understand this, we need to keep in mind that LTTng does everything > > > in per-cpu buffers, and that overwrite of oldest events in buffer full > > conditions > > > happens per-cpu. > > > The use-case you show below really makes sense when you start to consider > > > on which CPU (and thus per-cpu buffer) the application is running. > > Initially, > > > the first app runs on CPU 6. As long as the following applications run on > > > other cpus, we keep the same first event. Then when enough events are > > > written in cpu 6 buffer to overwrite the first event, we see a first event > > change. > > > Then, depending on the scheduling of applications, older parts of per-cpu > > > buffers are overwritten, and others are not (if there are less events in > > those > > > buffer). This means some per-cpu buffers will have events going further > > back > > > in time compared to other higher-throughput CPUs. > > > This is why you see more than 2 hunks in your diff. Babeltrace merges those > > > per-cpu buffers back into a single text output, and the fact that some > > events > > > have been overwritten for a time-range for high-throughput buffers, but not > > > for low-throughput buffers, make it appear as if events are missing in the > > > stream if your model is a single system-wide buffer, but the missing events > > > make sense if you see it as a merge of per-cpu buffers. > > > Our current behavior is to gather all available data when taking a > > snapshot, > > > and babeltrace behavior is to show all events available in the streams. We > > > could perhaps consider a different babeltrace behavior (a flag) that would > > > allow it to only show the events for the intersection of time range of all > > > streams, based on the packet header timestamp begin/end. This would > > > take care of hiding events that happen in low-throughput streams when > > > there is no intersection with snapshotted high-throughput streams. > > > And perhaps this should be the default ? > > > Thanks! > > > Mathieu > > > > TESTCASE; > > > > > > tracetest code: > > > > > > --------------------------------------------------------------- > > > > > > #include <stdio.h> > > > > > > #include "tracetest-tp.h" > > > > > > int main(int argc, char* argv[]) > > > > > > { > > > > > > int i; > > > > > > for (i = 0; i < 1000; i++) { > > > > > > tracepoint(tracetest, first_tp, i, "test"); > > > > > > } > > > > > > return 0; > > > > > > } > > > > > > --------------------------------------------------------------- > > > > > > test script: > > > > > > --------------------------------------------------------------- > > > > > > #!/bin/bash > > > > > > : ${NO_OF_SUBBUF:=6} > > > > > > : ${SUBBUF_SIZE:=4096} > > > > > > : ${KERNEL_TRACE:=NO} > > > > > > : ${LOOPS:=10} > > > > > > SESSION=test > > > > > > CHANNEL1=ch1 > > > > > > CHANNEL2=ch2 > > > > > > CH2_SUBBUF_SIZE=$(($SUBBUF_SIZE*$NO_OF_SUBBUF)) > > > > > > LTTNG='lttng -n' > > > > > > killall lttng-consumerd > /dev/null 2>&1 > > > > > > killall lttng-sessiond > /dev/null 2>&1 > > > > > > if [ -z $DEBUG ]; then > > > > > > lttng-sessiond --no-kernel -d > > > > > > else > > > > > > lttng-sessiond -v --no-kernel --verbose-consumer -b > > > > > > fi > > > > > > sleep 1 > > > > > > if [ ! -z $DEBUG ]; then > > > > > > export LTTNG_UST_DEBUG=1 > > > > > > fi > > > > > > rm -rf $HOME/lttng-traces > > > > > > rm snapshot/*.bt > > > > > > $LTTNG --version > > > > > > $LTTNG create $SESSION --snapshot > > > > > > $LTTNG enable-channel -u --subbuf-size $SUBBUF_SIZE --num-subbuf > > > $NO_OF_SUBBUF --overwrite $CHANNEL1 > > > > > > $LTTNG enable-channel -u --subbuf-size $CH2_SUBBUF_SIZE --num-subbuf 2 > > > --overwrite $CHANNEL2 > > > > > > $LTTNG enable-event -u -c $CHANNEL1 "*" > > > > > > $LTTNG start > > > > > > $LTTNG list $SESSION > > > > > > mkdir -p snapshot/ > > > > > > h=1 > > > > > > for i in $(seq 1 $LOOPS); do > > > > > > $APP > > > > > > $LTTNG snapshot record > > > > > > # print no of events and first and last event > > > > > > last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1) > > > > > > babeltrace $last > snapshot/$ i.bt > > > > > > cat snapshot/$ i.bt | wc -l > > > > > > cat snapshot/$ i.bt | head -1 > > > > > > cat snapshot/$ i.bt | tail -1 > > > > > > diff -u snapshot/$ i.bt snapshot/$ h.bt > snapshot/$i_$h.diff > > > > > > cat snapshot/$i_$h.diff | grep "@@" | wc -l | xargs echo "NUMBER OF DIFF > > > HUNKS = " > > > > > > h=$i > > > > > > done > > > > > > $LTTNG stop > > > > > > $LTTNG destroy $SESSION > > > > > > --------------------------------------------------------------- > > > > > > result from run (removed some output lines for readability) > > > > > > --------------------------------------------------------------- > > > > > > lttng (LTTng Trace Control) 2.4.2 - Époque Opaque > > > > > > Snapshot mode set. Every channel enabled for that session will be set in > > > overwrite mode and mmap output. > > > > > > Warning: The number of subbuffers (6) is rounded to the next power of 2 > > > (8) > > > > > > UST channel ch1 enabled for session test > > > > > > Warning: The subbuf size (24576) is rounded to the next power of 2 > > > (32768) > > > > > > UST channel ch2 enabled for session test > > > > > > UST event * created in channel ch1 > > > > > > Tracing started for session test > > > > > > Tracing session test: [active snapshot] > > > > > > Trace path: > > > > > > === Domain: UST global === > > > > > > Buffer type: per UID > > > > > > Channels: > > > > > > ------------- > > > > > > - ch1: [enabled] > > > > > > Attributes: > > > > > > overwrite mode: 1 > > > > > > subbufers size: 4096 > > > > > > number of subbufers: 8 > > > > > > switch timer interval: 0 > > > > > > read timer interval: 0 > > > > > > output: mmap() > > > > > > Events: > > > > > > * (type: tracepoint) [enabled] > > > > > > - ch2: [enabled] > > > > > > Attributes: > > > > > > overwrite mode: 1 > > > > > > subbufers size: 32768 > > > > > > number of subbufers: 2 > > > > > > switch timer interval: 0 > > > > > > read timer interval: 0 > > > > > > output: mmap() > > > > > > Events: > > > > > > None > > > > > > Snapshot recorded successfully for session test > > > > > > 1011 > > > > > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.279975100] (+0.000000975) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 5 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 0 > > > > > > Snapshot recorded successfully for session test > > > > > > 2022 > > > > > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.337703508] (+0.000001238) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 4 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 1 > > > > > > Snapshot recorded successfully for session test > > > > > > 3033 > > > > > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.432827051] (+0.000000308) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 6 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 1 > > > > > > Snapshot recorded successfully for session test > > > > > > 4044 > > > > > > [07:09:43.278933559] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 6 }, { baddr = 0x7FFF645EE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.475784825] (+0.000000501) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 5 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 1 > > > > > > Snapshot recorded successfully for session test > > > > > > 4735 > > > > > > [07:09:43.279031320] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 5 }, { my_string_field = "test", my_integer_field = 0 } > > > > > > [07:09:43.549945106] (+0.000000517) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 6 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 12 > > > > > > Snapshot recorded successfully for session test > > > > > > 4437 > > > > > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.654229030] (+0.000000436) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 5 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 3 > > > > > > Snapshot recorded successfully for session test > > > > > > 4746 > > > > > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.747509349] (+0.000000502) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 6 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 11 > > > > > > Snapshot recorded successfully for session test > > > > > > 5757 > > > > > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.814938128] (+0.000001371) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 4 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 1 > > > > > > Snapshot recorded successfully for session test > > > > > > 6066 > > > > > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:43.919112586] (+0.000000990) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 6 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 11 > > > > > > Snapshot recorded successfully for session test > > > > > > 6077 > > > > > > [07:09:43.335916076] (+?.?????????) 07f8cf84a38f > > > ust_baddr_statedump:soinfo: > > > { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, sopath = "[vdso]", size = 0, > > > mtime > > > = -1 } > > > > > > [07:09:44.004234933] (+0.000001213) 07f8cf84a38f tracetest:first_tp: { > > > cpu_id > > > = 6 }, { my_string_field = "test", my_integer_field = 999 } > > > > > > NUMBER OF DIFF HUNKS = 2 > > > > > > Waiting for data availability > > > > > > Tracing stopped for session test > > > > > > Session test destroyed > > > > > > --------------------------------------------------------------- > > > > > > Regards > > > > > > Anders Wallin > > > > > > _______________________________________________ > > > > > > lttng-dev mailing list > > > > > > [email protected] > > > > > > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > > > > > -- > > > Mathieu Desnoyers > > > EfficiOS Inc. > > > http://www.efficios.com > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
_______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
