Bug filed; https://bugs.lttng.org/issues/860
Anders Wallin On Fri, Nov 14, 2014 at 8:58 PM, Julien Desfossez <[email protected]> wrote: > Indeed, it's a problem, would you mind filling a bug report on > https://bugs.lttng.org/projects/lttng-tools so we can keep track of this > issue ? > > Thanks, > > Julien > > On 14-11-13 04:40 PM, Anders Wallin wrote: > > Looks like the patch creates a design bug. > > If you have one channel enabled the max snapshot size will be equal to > > one buffer times the number of channels (seems to be 8??) > > So max AND min size will always be the same and always much smaller then > > the content of the buffers > > The size argument passed to snapshot record is never used (or if it's to > > small it will complain) > > > > A Q&D workaround to get "all" data out is to create a 2nd non-used > > channel like this; > > > > : ${NO_OF_SUBBUF:=6} > > : ${SUBBUF_SIZE:=4096} > > > > SESSION=test > > CHANNEL1=ch1 > > CHANNEL2=ch2 > > > > CH2_SUBBUF_SIZE=$(($SUBBUF_SIZE*$NO_OF_SUBBUF)) > > ..... > > > > $LTTNG create $SESSION --snapshot > > $LTTNG enable-channel -u --subbuf-size $SUBBUF_SIZE --num-subbuf > > $NO_OF_SUBBUF --overwrite $CHANNEL1 > > $LTTNG enable-channel -u --subbuf-size $CH2_SUBBUF_SIZE --num-subbuf 2 > > --overwrite $CHANNEL2 > > $LTTNG enable-event -u -c $CHANNEL1 "*" > > .... > > > > Then I found another issue about the content of the traces, but I will > > file this under another topic > > > > > > Anders Wallin > > > > On Thu, Nov 13, 2014 at 6:41 PM, Anders Wallin <[email protected] > > <mailto:[email protected]>> wrote: > > > > No, I get the same problem with or w/o setting the size > > > > Anders Wallin > > > > On Thu, Nov 13, 2014 at 5:45 PM, Julien Desfossez > > <[email protected] <mailto:[email protected]>> wrote: > > > > Ok, so if you don't limit the size of the snapshot (remove the > > -m 1G) it > > works as expected ? > > > > Julien > > > > On 14-11-13 10:33 AM, Anders Wallin wrote: > > > bug is introduced in 68808f4e0bbb3adccff72ff9dab6ec9f3a9e6866 > > > > > > Regards > > > > > > Anders > > > > > > Anders Wallin > > > > > > On Thu, Nov 13, 2014 at 10:39 AM, Anders Wallin < > [email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > > > > > > > > > > > Anders Wallin > > > > > > On Thu, Nov 13, 2014 at 8:43 AM, Anders Wallin < > [email protected] <mailto:[email protected]> > > > <mailto:[email protected] <mailto:[email protected]>>> > wrote: > > > > > > On Thu, Nov 13, 2014 at 12:11 AM, Julien Desfossez > > > <[email protected] <mailto: > [email protected]> > > <mailto:[email protected] > > <mailto:[email protected]>>> wrote: > > > > > > Hi, > > > > > > On 14-11-12 05:30 PM, Jérémie Galarneau wrote: > > > > On Wed, Nov 12, 2014 at 7:39 AM, Anders Wallin > > > <[email protected] <mailto:[email protected]> > > <mailto:[email protected] <mailto:[email protected]>>> wrote: > > > >> Hi, > > > >> > > > >> using userspace snapshot mode in 2.5 seems to > > broken. > > > >> > > > >> I have small stupid application(tracetest) > > generating > > > 1000 events > > > >> > > > >> Running the script (the below is simplified > > version of > > > the script) > > > >> > > > > > > ---------------------------------------------------------------------------------- > > > >> #!/bin/bash > > > >> LTTNG='lttng -n' > > > >> lttng-sessiond --no-kernel & > > > > > > > > You should use the "-d" flag to launch the > > sessiond as a > > > daemon. Using > > > > "&" might cause the "create" command to launch a > > second > > > session daemon > > > > since the first one would not have had the time > > to open > > > create its > > > > command sockets. > > > > > > > >> > > > >> $LTTNG create test --snapshot > > > >> $LTTNG enable-channel -u --subbuf-size 64k > > --num-subbuf > > > 16 --overwrite ch1 > > > >> $LTTNG enable-event -u -c ch1 "*" > > > >> $LTTNG start > > > >> $LTTNG list ch1 > > > >> > > > >> mkdir -p snapshot/ > > > >> > > > >> for i in $(seq 1 $LOOPS); do > > > >> ./tracetest > > > >> $LTTNG snapshot record -m 1G > > > >> > > > >> # print no of events and first and last > event > > > >> last=$(ls -1drt > > $HOME/lttng-traces/$SESSION*/* | tail -1) > > > >> babeltrace $last > snapshot/$i.bt > > <http://i.bt> <http://i.bt> > > > >> cat snapshot/$i.bt <http://i.bt> > > <http://i.bt> | wc -l > > > >> cat snapshot/$i.bt <http://i.bt> > > <http://i.bt> | head -1 > > > >> cat snapshot/$i.bt <http://i.bt> > > <http://i.bt> | tail -1 > > > >> done > > > >> > > > >> $LTTNG stop > > > >> $LTTNG destroy $SESSION > > > >> > > > > > > ---------------------------------------------------------------------------------- > > > >> > > > >> The result of the run looks like this; > > > >> ./test.run > > > >> Error: consumer err socket second poll error > > > >> Error: Health error occurred in > > thread_manage_consumer > > > >> PERROR - 13:36:59.685167 [23772/23781]: bind > inet: > > > Address already in use > > > >> (in lttcomm_bind_inet_sock() at inet.c:109) > > > >> Warning: Another session daemon is using this > > JUL port. > > > JUL support will be > > > >> deactivated to prevent interfering with the > > tracing. > > > >> > > > > > > > > Is there a reason why you run two session > > daemons? I'm > > > guessing this > > > > is related to my "& vs -d" comment above. > > > > > > > >> lttng (LTTng Trace Control) 2.5.2 - Fumisterie > > > >> BabelTrace Trace Viewer and Converter 1.2.4 > > > >> linux-vdso.so.1 => (0x00007fffba966000) > > > >> liblttng-ctl.so.0 => > > > /usr/lib/x86_64-linux-gnu/liblttng-ctl.so.0 > > > >> (0x00007fbfe4b1d000) > > > >> librt.so.1 => > > /lib/x86_64-linux-gnu/librt.so.1 > > > (0x00007fbfe4915000) > > > >> libxml2.so.2 => > > > /usr/lib/x86_64-linux-gnu/libxml2.so.2 > > > >> (0x00007fbfe45ae000) > > > >> libpopt.so.0 => > > /lib/x86_64-linux-gnu/libpopt.so.0 > > > >> (0x00007fbfe43a2000) > > > >> libpthread.so.0 => > > > /lib/x86_64-linux-gnu/libpthread.so.0 > > > >> (0x00007fbfe4184000) > > > >> libc.so.6 => > > /lib/x86_64-linux-gnu/libc.so.6 > > > (0x00007fbfe3dbd000) > > > >> /lib64/ld-linux-x86-64.so.2 > > (0x00007fbfe4fac000) > > > >> liburcu.so.2 => > > > /usr/lib/x86_64-linux-gnu/liburcu.so.2 > > > >> (0x00007fbfe3bb6000) > > > >> libdl.so.2 => > > /lib/x86_64-linux-gnu/libdl.so.2 > > > (0x00007fbfe39b2000) > > > >> libz.so.1 => > > /lib/x86_64-linux-gnu/libz.so.1 > > > (0x00007fbfe3798000) > > > >> liblzma.so.5 => > > /lib/x86_64-linux-gnu/liblzma.so.5 > > > >> (0x00007fbfe3576000) > > > >> libm.so.6 => > > /lib/x86_64-linux-gnu/libm.so.6 > > > (0x00007fbfe3270000) > > > >> > > > >> Session test created. > > > >> Default snapshot output set to: > > > >> /home/awallin/lttng-traces/test-20141112-133700 > > > >> Snapshot mode set. Every channel enabled for > > that session > > > will be set in > > > >> overwrite mode and mmap output. > > > >> UST channel ch1 enabled for session test > > > >> UST event * created in channel ch1 > > > >> Tracing started for session test > > > >> Tracing session test: [active snapshot] > > > >> Trace path: > > > >> Live timer interval (usec): 4294967295 > > > >> > > > >> === Domain: UST global === > > > >> > > > >> Channels: > > > >> ------------- > > > >> - ch1: [enabled] > > > >> > > > >> Attributes: > > > >> overwrite mode: 1 > > > >> subbufers size: 4096 > > > >> number of subbufers: 16 > > > >> switch timer interval: 0 > > > >> read timer interval: 0 > > > >> trace file count: 0 > > > >> trace file size (bytes): 0 > > > >> output: mmap() > > > >> > > > >> Events: > > > >> * (type: tracepoint) [enabled] > > > >> > > > >> Snapshot recorded successfully for session test > > > >> 84 > > > >> [13:37:00.852615950] (+?.?????????) > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { > > baddr = > > > 0x7FFFC6BC7000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:00.853798949] (+0.000000805) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 6 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 157 > > > >> [13:37:00.853735195] (+?.?????????) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 6 }, { my_string_field = "test", > > > my_integer_field = 927 } > > > >> [13:37:00.894839116] (+0.000000299) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 5 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:00.853735195] (+?.?????????) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 6 }, { my_string_field = "test", > > > my_integer_field = 927 } > > > >> [13:37:00.929447330] (+0.000000305) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 5 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > > > > Something is wrong from here: > > > >> 168 > > > >> [13:37:00.894411117] (+?.?????????) > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { > > baddr = 0x7FFF6BF81000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:00.972035546] (+0.000000597) > > arn-awallin-mint-l3 tracetest:first_tp: > > > >> { cpu_id = 6 }, { my_string_field = "test", > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:00.894411117] (+?.?????????) > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { > > baddr = 0x7FFF6BF81000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:01.014060940 <tel:014060940> > > <tel:014060940 <tel:014060940>>] (+0.000000341) > > > arn-awallin-mint-l3 tracetest:first_tp: > > > >> { cpu_id = 7 }, { my_string_field = "test", > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:00.894411117] (+?.?????????) > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { > baddr = 0x7FFF6BF81000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:01.052555877 <tel:052555877> > > <tel:052555877 <tel:052555877>>] (+0.000000872) > > > arn-awallin-mint-l3 tracetest:first_tp: > > > >> { cpu_id = 6 }, { my_string_field = "test", > my_integer_field = 999 } > > > ... to here. > > > > > > The beginning timestamp is the same, the number of > events is > > > the same, > > > but the end timestamp moves. We see also this > pattern below. > > > I would be really interested to see the whole > snapshots when > > > it happens. > > > It seems like we are not be overwriting the > beginning of the > > > ring buffer > > > in some conditions. Could you send us your > tracetest program > > > and the > > > script so we can try to reproduce the problem ? > > > > > > > > > I will append it including a run with debug turned on > and all > > > the snapshot files created > > > > > > > > > > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:01.014035993 <tel:014035993> > > <tel:014035993 <tel:014035993>>] (+?.?????????) > > > arn-awallin-mint-l3 tracetest:first_tp: > > > >> { cpu_id = 7 }, { my_string_field = "test", > > > my_integer_field = 927 } > > > >> [13:37:01.098620808] (+0.000001404) > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 4 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:01.097087388 <tel:097087388> > > <tel:097087388 <tel:097087388>>] (+?.?????????) > > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { > baddr = > > > 0x7FFF4BFFE000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:01.129839830] (+0.000000533) > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 7 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:01.097087388 <tel:097087388> > > <tel:097087388 <tel:097087388>>] (+?.?????????) > > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { > baddr = > > > 0x7FFF4BFFE000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:01.129839830] (+0.000000533) > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 7 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:01.097087388 <tel:097087388> > > <tel:097087388 <tel:097087388>>] (+?.?????????) > > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { > > baddr = > > > 0x7FFF4BFFE000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:01.172829105] (+0.000000575) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 5 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Snapshot recorded successfully for session test > > > >> 168 > > > >> [13:37:01.172021557] (+?.?????????) > > arn-awallin-mint-l3 > > > >> ust_baddr_statedump:soinfo: { cpu_id = 7 }, { > > baddr = > > > 0x7FFF653FE000, sopath > > > >> = "[vdso]", size = 0, mtime = -1 } > > > >> [13:37:01.220074165] (+0.000000553) > > arn-awallin-mint-l3 > > > tracetest:first_tp: > > > >> { cpu_id = 6 }, { my_string_field = "test", > > > my_integer_field = 999 } > > > >> Waiting for data availability > > > >> Tracing stopped for session test > > > >> Session test destroyed > > > >> > > > >> the result from the run on older versions of > lttng > > > creates bigger and bigger > > > >> event traces until reaching max size. > > > We see this pattern here and the max size seems to > > be 168 > > > events, was it > > > similar in your previous tests ? > > > > > > If I'm running the same test on 2.4 or 2.3 the max > > number of > > > events is much bigger, like ~10000 > > > (I don't have the exact number right now, I will > > rebuild and run > > > the test again and get back > > > with traces from that run) > > > > > > For 2.4.2 the log looks like this: > > > ./test.run ./tracetest > > > > > > lttng (LTTng Trace Control) 2.4.2 - Époque Opaque > > > BabelTrace Trace Viewer and Converter 1.2.1 > > > linux-vdso.so.1 => (0x00007fff775fe000) > > > liblttng-ctl.so.0 => /usr/local/lib/liblttng-ctl.so.0 > > > (0x00002b2b016e4000) > > > libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0 > > (0x00002b2b0190f000) > > > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 > > > (0x00002b2b01b1b000) > > > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 > > (0x00002b2b01d39000) > > > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 > > (0x00002b2b020ff000) > > > liburcu.so.3 => /usr/local/lib/liburcu.so.3 > > (0x00002b2b02307000) > > > /lib64/ld-linux-x86-64.so.2 (0x00002b2b014bf000) > > > > > > Session test created. > > > Default snapshot output set to: > > //lttng-traces/test-20141113-093611 > > > Snapshot mode set. Every channel enabled for that session > > will be > > > set in overwrite mode and mmap output. > > > UST channel ch1 enabled for session test > > > UST event * created in channel ch1 > > > Tracing started for session test > > > Tracing session test: [active snapshot] > > > Trace path: > > > > > > === Domain: UST global === > > > > > > Buffer type: per UID > > > > > > Channels: > > > ------------- > > > - ch1: [enabled] > > > > > > Attributes: > > > overwrite mode: 1 > > > subbufers size: 4096 > > > number of subbufers: 16 > > > switch timer interval: 0 > > > read timer interval: 0 > > > output: mmap() > > > > > > Events: > > > * (type: tracepoint) [enabled] > > > > > > Snapshot recorded successfully for session test > > > 1000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:11.985857254] (+0.000000501) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 2000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.033066130 <tel:033066130> <tel:033066130 > > <tel:033066130>>] (+0.000001663) 07f8cf84a38f > > > tracetest:first_tp: { cpu_id = 5 }, { my_string_field = > "test", > > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 3000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > my_integer_field = 0 } > > > [09:36:12.086888121 <tel:086888121> <tel:086888121 > > <tel:086888121>>] (+0.000001273) 07f8cf84a38f > > > tracetest:first_tp: { cpu_id = 5 }, { my_string_field = > > "test", > > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 4000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.147378251] (+0.000001257) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 5000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.225584696] (+0.000000528) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 6000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.288311172] (+0.000000571) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 7 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 7000 > > > [09:36:11.985319272] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.375789307] (+0.000001238) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 7000 > > > [09:36:12.032314084] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 5 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.460756779] (+0.000001248) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 4 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 8000 > > > [09:36:12.032314084] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 5 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.540464090] (+0.000001292) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 7 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Snapshot recorded successfully for session test > > > 9000 > > > [09:36:12.032314084] (+?.?????????) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 5 }, { my_string_field = "test", > > my_integer_field = 0 } > > > [09:36:12.621992931] (+0.000001298) 07f8cf84a38f > > tracetest:first_tp: > > > { cpu_id = 5 }, { my_string_field = "test", > > my_integer_field = 999 } > > > Waiting for data availability > > > Tracing stopped for session test > > > Session test destroyed > > > > > > > > > > > > > > > > > Unless I'm mistaken, the behavior of the > > snapshot mode has changed > > > > between 2.4 and 2.5. > > > > > > > > In the 2.4 series, consecutive snapshots could > > overlap; that is, > > > > events could be present in multiple snapshots. > > As of 2.5, an event is > > > > guaranteed to only appear in one snapshot as the > > events are considered > > > > as having been "consumed" at that point. > > > > > > Hum, no the behavior is still the same, consecutive > > > snapshots can overlap. > > > > > > Could you tell us on which architecture you are > > experiencing > > > the problem > > > and the number of CPUs ? > > > > > > The traces appended is from my laptop running Ubuntu > > 14.04 with > > > "8" cpu's > > > model name : Intel(R) Core(TM) i7-2760QM CPU @ > > 2.40GHz > > > I have also run this on ARM with 16 cores and a powerpc > > > > > > > > > Thanks, > > > > > > Julien > > > > > > > > > > > > > > > > > > >
_______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
