Filed as https://bugs.lttng.org/issues/862
Anders Wallin On Mon, Nov 17, 2014 at 12:45 PM, Mathieu Desnoyers < [email protected]> wrote: > Hi Anders, > > Yes, please file a feature request against Babeltrace on > the bug tracker pointing the this email thread and > describing the feature. > > Thanks! > > Mathieu > > > ------------------------------ > > *From: *"Anders Wallin" <[email protected]> > *To: *"Mathieu Desnoyers" <[email protected]> > *Cc: *[email protected] > *Sent: *Monday, November 17, 2014 10:45:37 AM > *Subject: *Re: [lttng-dev] snapshot data corrupted after the output > buffer is filled > > > Thx Mathieu, > > if I'm running my test on one cpu (taskset -c 1 ./test.run) it works as > expected and described by you. > The babeltrace flag would be a good solution! Do you want me to file a > bug/enhancement on this? > > Regards > > > > Anders Wallin > > On Sat, Nov 15, 2014 at 6:59 PM, Mathieu Desnoyers < > [email protected]> wrote: > >> ------------------------------ >> >> *From: *"Anders Wallin" <[email protected]> >> *To: *[email protected] >> *Sent: *Friday, November 14, 2014 8:28:18 AM >> *Subject: *[lttng-dev] snapshot data corrupted after the output buffer >> is filled >> >> Hi, >> >> it looks like snapshot events became faulty after the output max size is >> reached. >> The next snapshot get mixed events from different buffers. I have tested >> it on 2.3, 2.4 and 2.5 >> and it's in all of them, but it hits earlier in 2.5 due to the problem >> reported in >> http://lists.lttng.org/pipermail/lttng-dev/2014-November/023773.html >> >> There should never be more then 2 HUNKS in a diff between 2 consecutive >> snapshots, here I got >> 11 as the worst case. >> >> Hi Anders, >> >> This is not a bug, rather just per-cpu buffering at work.It does have some >> rather non-obvious consequences in terms of snapshot event flow. >> >> To understand this, we need to keep in mind that LTTng does everything >> in per-cpu buffers, and that overwrite of oldest events in buffer full >> conditions >> happens per-cpu. >> >> The use-case you show below really makes sense when you start to consider >> on which CPU (and thus per-cpu buffer) the application is running. >> Initially, >> the first app runs on CPU 6. As long as the following applications run on >> other cpus, we keep the same first event. Then when enough events are >> written in cpu 6 buffer to overwrite the first event, we see a first >> event change. >> Then, depending on the scheduling of applications, older parts of per-cpu >> buffers are overwritten, and others are not (if there are less events in >> those >> buffer). This means some per-cpu buffers will have events going further >> back >> in time compared to other higher-throughput CPUs. >> >> This is why you see more than 2 hunks in your diff. Babeltrace merges >> those >> per-cpu buffers back into a single text output, and the fact that some >> events >> have been overwritten for a time-range for high-throughput buffers, but >> not >> for low-throughput buffers, make it appear as if events are missing in the >> stream if your model is a single system-wide buffer, but the missing >> events >> make sense if you see it as a merge of per-cpu buffers. >> >> Our current behavior is to gather all available data when taking a >> snapshot, >> and babeltrace behavior is to show all events available in the streams. We >> could perhaps consider a different babeltrace behavior (a flag) that would >> allow it to only show the events for the intersection of time range of all >> streams, based on the packet header timestamp begin/end. This would >> take care of hiding events that happen in low-throughput streams when >> there is no intersection with snapshotted high-throughput streams. >> >> And perhaps this should be the default ? >> >> Thanks! >> >> Mathieu >> >> >> TESTCASE; >> >> tracetest code: >> --------------------------------------------------------------- >> #include <stdio.h> >> #include "tracetest-tp.h" >> >> int main(int argc, char* argv[]) >> { >> int i; >> for (i = 0; i < 1000; i++) { >> tracepoint(tracetest, first_tp, i, "test"); >> } >> return 0; >> } >> --------------------------------------------------------------- >> >> test script: >> --------------------------------------------------------------- >> #!/bin/bash >> : ${NO_OF_SUBBUF:=6} >> : ${SUBBUF_SIZE:=4096} >> : ${KERNEL_TRACE:=NO} >> : ${LOOPS:=10} >> >> SESSION=test >> CHANNEL1=ch1 >> CHANNEL2=ch2 >> >> CH2_SUBBUF_SIZE=$(($SUBBUF_SIZE*$NO_OF_SUBBUF)) >> >> >> LTTNG='lttng -n' >> killall lttng-consumerd > /dev/null 2>&1 >> killall lttng-sessiond > /dev/null 2>&1 >> if [ -z $DEBUG ]; then >> lttng-sessiond --no-kernel -d >> else >> lttng-sessiond -v --no-kernel --verbose-consumer -b >> fi >> sleep 1 >> >> if [ ! -z $DEBUG ]; then >> export LTTNG_UST_DEBUG=1 >> fi >> >> rm -rf $HOME/lttng-traces >> rm snapshot/*.bt >> >> $LTTNG --version >> >> $LTTNG create $SESSION --snapshot >> $LTTNG enable-channel -u --subbuf-size $SUBBUF_SIZE --num-subbuf >> $NO_OF_SUBBUF --overwrite $CHANNEL1 >> $LTTNG enable-channel -u --subbuf-size $CH2_SUBBUF_SIZE --num-subbuf 2 >> --overwrite $CHANNEL2 >> $LTTNG enable-event -u -c $CHANNEL1 "*" >> $LTTNG start >> $LTTNG list $SESSION >> >> mkdir -p snapshot/ >> >> h=1 >> for i in $(seq 1 $LOOPS); do >> $APP >> $LTTNG snapshot record >> # print no of events and first and last event >> >> >> last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1) >> babeltrace $last > snapshot/$i.bt >> cat snapshot/$i.bt | wc -l >> cat snapshot/$i.bt | head -1 >> cat snapshot/$i.bt | tail -1 >> diff -u snapshot/$i.bt snapshot/$h.bt > snapshot/$i_$h.diff >> cat snapshot/$i_$h.diff | grep "@@" | wc -l | xargs echo "NUMBER OF >> DIFF HUNKS = " >> h=$i >> done >> >> $LTTNG stop >> $LTTNG destroy $SESSION >> >> --------------------------------------------------------------- >> >> result from run (removed some output lines for readability) >> --------------------------------------------------------------- >> lttng (LTTng Trace Control) 2.4.2 - Époque Opaque >> Snapshot mode set. Every channel enabled for that session will be set in >> overwrite mode and mmap output. >> Warning: The number of subbuffers (6) is rounded to the next power of 2 >> (8) >> UST channel ch1 enabled for session test >> Warning: The subbuf size (24576) is rounded to the next power of 2 (32768) >> UST channel ch2 enabled for session test >> UST event * created in channel ch1 >> Tracing started for session test >> Tracing session test: [active snapshot] >> Trace path: >> >> === Domain: UST global === >> >> Buffer type: per UID >> >> Channels: >> ------------- >> - ch1: [enabled] >> >> Attributes: >> overwrite mode: 1 >> subbufers size: 4096 >> number of subbufers: 8 >> switch timer interval: 0 >> read timer interval: 0 >> output: mmap() >> >> Events: >> * (type: tracepoint) [enabled] >> - ch2: [enabled] >> >> Attributes: >> overwrite mode: 1 >> subbufers size: 32768 >> number of subbufers: 2 >> switch timer interval: 0 >> read timer interval: 0 >> output: mmap() >> >> Events: >> None >> >> Snapshot recorded successfully for session test >> 1011 >> [07:09:43.278933559] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF645EE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.279975100] (+0.000000975) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 0 >> Snapshot recorded successfully for session test >> 2022 >> [07:09:43.278933559] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF645EE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.337703508] (+0.000001238) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 1 >> Snapshot recorded successfully for session test >> 3033 >> [07:09:43.278933559] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF645EE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.432827051] (+0.000000308) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 1 >> Snapshot recorded successfully for session test >> 4044 >> [07:09:43.278933559] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF645EE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.475784825] (+0.000000501) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 1 >> Snapshot recorded successfully for session test >> 4735 >> [07:09:43.279031320] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 } >> [07:09:43.549945106] (+0.000000517) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 12 >> Snapshot recorded successfully for session test >> 4437 >> [07:09:43.335916076] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.654229030] (+0.000000436) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 3 >> Snapshot recorded successfully for session test >> 4746 >> [07:09:43.335916076] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.747509349] (+0.000000502) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 11 >> Snapshot recorded successfully for session test >> 5757 >> [07:09:43.335916076] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.814938128] (+0.000001371) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 1 >> Snapshot recorded successfully for session test >> 6066 >> [07:09:43.335916076] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:43.919112586] (+0.000000990) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 11 >> Snapshot recorded successfully for session test >> 6077 >> [07:09:43.335916076] (+?.?????????) 07f8cf84a38f >> ust_baddr_statedump:soinfo: { cpu_id = 2 }, { baddr = 0x7FFFAD1FE000, >> sopath = "[vdso]", size = 0, mtime = -1 } >> [07:09:44.004234933] (+0.000001213) 07f8cf84a38f tracetest:first_tp: { >> cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> NUMBER OF DIFF HUNKS = 2 >> Waiting for data availability >> Tracing stopped for session test >> Session test destroyed >> --------------------------------------------------------------- >> >> Regards >> >> Anders Wallin >> >> _______________________________________________ >> lttng-dev mailing list >> [email protected] >> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >> >> >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com >> > > > > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com >
_______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
