Anders Wallin On Thu, Nov 13, 2014 at 8:43 AM, Anders Wallin <[email protected]> wrote:
> On Thu, Nov 13, 2014 at 12:11 AM, Julien Desfossez < > [email protected]> wrote: > >> Hi, >> >> On 14-11-12 05:30 PM, Jérémie Galarneau wrote: >> > On Wed, Nov 12, 2014 at 7:39 AM, Anders Wallin <[email protected]> >> wrote: >> >> Hi, >> >> >> >> using userspace snapshot mode in 2.5 seems to broken. >> >> >> >> I have small stupid application(tracetest) generating 1000 events >> >> >> >> Running the script (the below is simplified version of the script) >> >> >> ---------------------------------------------------------------------------------- >> >> #!/bin/bash >> >> LTTNG='lttng -n' >> >> lttng-sessiond --no-kernel & >> > >> > You should use the "-d" flag to launch the sessiond as a daemon. Using >> > "&" might cause the "create" command to launch a second session daemon >> > since the first one would not have had the time to open create its >> > command sockets. >> > >> >> >> >> $LTTNG create test --snapshot >> >> $LTTNG enable-channel -u --subbuf-size 64k --num-subbuf 16 --overwrite >> ch1 >> >> $LTTNG enable-event -u -c ch1 "*" >> >> $LTTNG start >> >> $LTTNG list ch1 >> >> >> >> mkdir -p snapshot/ >> >> >> >> for i in $(seq 1 $LOOPS); do >> >> ./tracetest >> >> $LTTNG snapshot record -m 1G >> >> >> >> # print no of events and first and last event >> >> last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1) >> >> babeltrace $last > snapshot/$i.bt >> >> cat snapshot/$i.bt | wc -l >> >> cat snapshot/$i.bt | head -1 >> >> cat snapshot/$i.bt | tail -1 >> >> done >> >> >> >> $LTTNG stop >> >> $LTTNG destroy $SESSION >> >> >> ---------------------------------------------------------------------------------- >> >> >> >> The result of the run looks like this; >> >> ./test.run >> >> Error: consumer err socket second poll error >> >> Error: Health error occurred in thread_manage_consumer >> >> PERROR - 13:36:59.685167 [23772/23781]: bind inet: Address already in >> use >> >> (in lttcomm_bind_inet_sock() at inet.c:109) >> >> Warning: Another session daemon is using this JUL port. JUL support >> will be >> >> deactivated to prevent interfering with the tracing. >> >> >> > >> > Is there a reason why you run two session daemons? I'm guessing this >> > is related to my "& vs -d" comment above. >> > >> >> lttng (LTTng Trace Control) 2.5.2 - Fumisterie >> >> BabelTrace Trace Viewer and Converter 1.2.4 >> >> linux-vdso.so.1 => (0x00007fffba966000) >> >> liblttng-ctl.so.0 => >> /usr/lib/x86_64-linux-gnu/liblttng-ctl.so.0 >> >> (0x00007fbfe4b1d000) >> >> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 >> (0x00007fbfe4915000) >> >> libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 >> >> (0x00007fbfe45ae000) >> >> libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0 >> >> (0x00007fbfe43a2000) >> >> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 >> >> (0x00007fbfe4184000) >> >> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 >> (0x00007fbfe3dbd000) >> >> /lib64/ld-linux-x86-64.so.2 (0x00007fbfe4fac000) >> >> liburcu.so.2 => /usr/lib/x86_64-linux-gnu/liburcu.so.2 >> >> (0x00007fbfe3bb6000) >> >> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 >> (0x00007fbfe39b2000) >> >> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 >> (0x00007fbfe3798000) >> >> liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 >> >> (0x00007fbfe3576000) >> >> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 >> (0x00007fbfe3270000) >> >> >> >> Session test created. >> >> Default snapshot output set to: >> >> /home/awallin/lttng-traces/test-20141112-133700 >> >> Snapshot mode set. Every channel enabled for that session will be set >> in >> >> overwrite mode and mmap output. >> >> UST channel ch1 enabled for session test >> >> UST event * created in channel ch1 >> >> Tracing started for session test >> >> Tracing session test: [active snapshot] >> >> Trace path: >> >> Live timer interval (usec): 4294967295 >> >> >> >> === Domain: UST global === >> >> >> >> Channels: >> >> ------------- >> >> - ch1: [enabled] >> >> >> >> Attributes: >> >> overwrite mode: 1 >> >> subbufers size: 4096 >> >> number of subbufers: 16 >> >> switch timer interval: 0 >> >> read timer interval: 0 >> >> trace file count: 0 >> >> trace file size (bytes): 0 >> >> output: mmap() >> >> >> >> Events: >> >> * (type: tracepoint) [enabled] >> >> >> >> Snapshot recorded successfully for session test >> >> 84 >> >> [13:37:00.852615950] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFFC6BC7000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:00.853798949] (+0.000000805) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 157 >> >> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 927 } >> >> [13:37:00.894839116] (+0.000000299) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 927 } >> >> [13:37:00.929447330] (+0.000000305) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> Something is wrong from here: >> >> 168 >> >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:00.972035546] (+0.000000597) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:01.014060940] (+0.000000341) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:01.052555877] (+0.000000872) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> ... to here. >> >> The beginning timestamp is the same, the number of events is the same, >> but the end timestamp moves. We see also this pattern below. >> I would be really interested to see the whole snapshots when it happens. >> It seems like we are not be overwriting the beginning of the ring buffer >> in some conditions. Could you send us your tracetest program and the >> script so we can try to reproduce the problem ? >> > > I will append it including a run with debug turned on and all the snapshot > files created > > >> >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:01.014035993] (+?.?????????) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 927 } >> >> [13:37:01.098620808] (+0.000001404) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:01.172829105] (+0.000000575) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >> >> Snapshot recorded successfully for session test >> >> 168 >> >> [13:37:01.172021557] (+?.?????????) arn-awallin-mint-l3 >> >> ust_baddr_statedump:soinfo: { cpu_id = 7 }, { baddr = 0x7FFF653FE000, >> sopath >> >> = "[vdso]", size = 0, mtime = -1 } >> >> [13:37:01.220074165] (+0.000000553) arn-awallin-mint-l3 >> tracetest:first_tp: >> >> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >> >> Waiting for data availability >> >> Tracing stopped for session test >> >> Session test destroyed >> >> >> >> the result from the run on older versions of lttng creates bigger and >> bigger >> >> event traces until reaching max size. >> We see this pattern here and the max size seems to be 168 events, was it >> similar in your previous tests ? >> > If I'm running the same test on 2.4 or 2.3 the max number of events is > much bigger, like ~10000 > (I don't have the exact number right now, I will rebuild and run the test > again and get back > with traces from that run) > For 2.4.2 the log looks like this: ./test.run ./tracetest lttng (LTTng Trace Control) 2.4.2 - Époque Opaque BabelTrace Trace Viewer and Converter 1.2.1 linux-vdso.so.1 => (0x00007fff775fe000) liblttng-ctl.so.0 => /usr/local/lib/liblttng-ctl.so.0 (0x00002b2b016e4000) libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0 (0x00002b2b0190f000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00002b2b01b1b000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00002b2b01d39000) librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00002b2b020ff000) liburcu.so.3 => /usr/local/lib/liburcu.so.3 (0x00002b2b02307000) /lib64/ld-linux-x86-64.so.2 (0x00002b2b014bf000) Session test created. Default snapshot output set to: //lttng-traces/test-20141113-093611 Snapshot mode set. Every channel enabled for that session will be set in overwrite mode and mmap output. UST channel ch1 enabled for session test UST event * created in channel ch1 Tracing started for session test Tracing session test: [active snapshot] Trace path: === Domain: UST global === Buffer type: per UID Channels: ------------- - ch1: [enabled] Attributes: overwrite mode: 1 subbufers size: 4096 number of subbufers: 16 switch timer interval: 0 read timer interval: 0 output: mmap() Events: * (type: tracepoint) [enabled] Snapshot recorded successfully for session test 1000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:11.985857254] (+0.000000501) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 2000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.033066130] (+0.000001663) 07f8cf84a38f tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 3000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.086888121] (+0.000001273) 07f8cf84a38f tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 4000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.147378251] (+0.000001257) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 5000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.225584696] (+0.000000528) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 6000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.288311172] (+0.000000571) 07f8cf84a38f tracetest:first_tp: { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 7000 [09:36:11.985319272] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.375789307] (+0.000001238) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 7000 [09:36:12.032314084] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.460756779] (+0.000001248) 07f8cf84a38f tracetest:first_tp: { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 8000 [09:36:12.032314084] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.540464090] (+0.000001292) 07f8cf84a38f tracetest:first_tp: { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } Snapshot recorded successfully for session test 9000 [09:36:12.032314084] (+?.?????????) 07f8cf84a38f tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 0 } [09:36:12.621992931] (+0.000001298) 07f8cf84a38f tracetest:first_tp: { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } Waiting for data availability Tracing stopped for session test Session test destroyed > >> > >> > Unless I'm mistaken, the behavior of the snapshot mode has changed >> > between 2.4 and 2.5. >> > >> > In the 2.4 series, consecutive snapshots could overlap; that is, >> > events could be present in multiple snapshots. As of 2.5, an event is >> > guaranteed to only appear in one snapshot as the events are considered >> > as having been "consumed" at that point. >> >> Hum, no the behavior is still the same, consecutive snapshots can overlap. >> >> Could you tell us on which architecture you are experiencing the problem >> and the number of CPUs ? >> > The traces appended is from my laptop running Ubuntu 14.04 with "8" cpu's > model name : Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz > I have also run this on ARM with 16 cores and a powerpc > >> >> Thanks, >> >> Julien >> > >
_______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
