On Wed, Nov 12, 2014 at 6:11 PM, Julien Desfossez <[email protected]> wrote: > Hi, > > On 14-11-12 05:30 PM, Jérémie Galarneau wrote: >> On Wed, Nov 12, 2014 at 7:39 AM, Anders Wallin <[email protected]> wrote: >>> Hi, >>> >>> using userspace snapshot mode in 2.5 seems to broken. >>> >>> I have small stupid application(tracetest) generating 1000 events >>> >>> Running the script (the below is simplified version of the script) >>> ---------------------------------------------------------------------------------- >>> #!/bin/bash >>> LTTNG='lttng -n' >>> lttng-sessiond --no-kernel & >> >> You should use the "-d" flag to launch the sessiond as a daemon. Using >> "&" might cause the "create" command to launch a second session daemon >> since the first one would not have had the time to open create its >> command sockets. >> >>> >>> $LTTNG create test --snapshot >>> $LTTNG enable-channel -u --subbuf-size 64k --num-subbuf 16 --overwrite ch1 >>> $LTTNG enable-event -u -c ch1 "*" >>> $LTTNG start >>> $LTTNG list ch1 >>> >>> mkdir -p snapshot/ >>> >>> for i in $(seq 1 $LOOPS); do >>> ./tracetest >>> $LTTNG snapshot record -m 1G >>> >>> # print no of events and first and last event >>> last=$(ls -1drt $HOME/lttng-traces/$SESSION*/* | tail -1) >>> babeltrace $last > snapshot/$i.bt >>> cat snapshot/$i.bt | wc -l >>> cat snapshot/$i.bt | head -1 >>> cat snapshot/$i.bt | tail -1 >>> done >>> >>> $LTTNG stop >>> $LTTNG destroy $SESSION >>> ---------------------------------------------------------------------------------- >>> >>> The result of the run looks like this; >>> ./test.run >>> Error: consumer err socket second poll error >>> Error: Health error occurred in thread_manage_consumer >>> PERROR - 13:36:59.685167 [23772/23781]: bind inet: Address already in use >>> (in lttcomm_bind_inet_sock() at inet.c:109) >>> Warning: Another session daemon is using this JUL port. JUL support will be >>> deactivated to prevent interfering with the tracing. >>> >> >> Is there a reason why you run two session daemons? I'm guessing this >> is related to my "& vs -d" comment above. >> >>> lttng (LTTng Trace Control) 2.5.2 - Fumisterie >>> BabelTrace Trace Viewer and Converter 1.2.4 >>> linux-vdso.so.1 => (0x00007fffba966000) >>> liblttng-ctl.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ctl.so.0 >>> (0x00007fbfe4b1d000) >>> librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fbfe4915000) >>> libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 >>> (0x00007fbfe45ae000) >>> libpopt.so.0 => /lib/x86_64-linux-gnu/libpopt.so.0 >>> (0x00007fbfe43a2000) >>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 >>> (0x00007fbfe4184000) >>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbfe3dbd000) >>> /lib64/ld-linux-x86-64.so.2 (0x00007fbfe4fac000) >>> liburcu.so.2 => /usr/lib/x86_64-linux-gnu/liburcu.so.2 >>> (0x00007fbfe3bb6000) >>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fbfe39b2000) >>> libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fbfe3798000) >>> liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 >>> (0x00007fbfe3576000) >>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fbfe3270000) >>> >>> Session test created. >>> Default snapshot output set to: >>> /home/awallin/lttng-traces/test-20141112-133700 >>> Snapshot mode set. Every channel enabled for that session will be set in >>> overwrite mode and mmap output. >>> UST channel ch1 enabled for session test >>> UST event * created in channel ch1 >>> Tracing started for session test >>> Tracing session test: [active snapshot] >>> Trace path: >>> Live timer interval (usec): 4294967295 >>> >>> === Domain: UST global === >>> >>> Channels: >>> ------------- >>> - ch1: [enabled] >>> >>> Attributes: >>> overwrite mode: 1 >>> subbufers size: 4096 >>> number of subbufers: 16 >>> switch timer interval: 0 >>> read timer interval: 0 >>> trace file count: 0 >>> trace file size (bytes): 0 >>> output: mmap() >>> >>> Events: >>> * (type: tracepoint) [enabled] >>> >>> Snapshot recorded successfully for session test >>> 84 >>> [13:37:00.852615950] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFFC6BC7000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:00.853798949] (+0.000000805) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 157 >>> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 927 } >>> [13:37:00.894839116] (+0.000000299) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:00.853735195] (+?.?????????) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 927 } >>> [13:37:00.929447330] (+0.000000305) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test > > Something is wrong from here: >>> 168 >>> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:00.972035546] (+0.000000597) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:01.014060940] (+0.000000341) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:00.894411117] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 4 }, { baddr = 0x7FFF6BF81000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:01.052555877] (+0.000000872) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } > ... to here. > > The beginning timestamp is the same, the number of events is the same, > but the end timestamp moves. We see also this pattern below. > I would be really interested to see the whole snapshots when it happens. > It seems like we are not be overwriting the beginning of the ring buffer > in some conditions. Could you send us your tracetest program and the > script so we can try to reproduce the problem ? > >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:01.014035993] (+?.?????????) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 927 } >>> [13:37:01.098620808] (+0.000001404) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 4 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:01.129839830] (+0.000000533) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 7 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:01.097087388] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 6 }, { baddr = 0x7FFF4BFFE000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:01.172829105] (+0.000000575) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 5 }, { my_string_field = "test", my_integer_field = 999 } >>> Snapshot recorded successfully for session test >>> 168 >>> [13:37:01.172021557] (+?.?????????) arn-awallin-mint-l3 >>> ust_baddr_statedump:soinfo: { cpu_id = 7 }, { baddr = 0x7FFF653FE000, sopath >>> = "[vdso]", size = 0, mtime = -1 } >>> [13:37:01.220074165] (+0.000000553) arn-awallin-mint-l3 tracetest:first_tp: >>> { cpu_id = 6 }, { my_string_field = "test", my_integer_field = 999 } >>> Waiting for data availability >>> Tracing stopped for session test >>> Session test destroyed >>> >>> the result from the run on older versions of lttng creates bigger and bigger >>> event traces until reaching max size. > We see this pattern here and the max size seems to be 168 events, was it > similar in your previous tests ? > >> >> Unless I'm mistaken, the behavior of the snapshot mode has changed >> between 2.4 and 2.5. >> >> In the 2.4 series, consecutive snapshots could overlap; that is, >> events could be present in multiple snapshots. As of 2.5, an event is >> guaranteed to only appear in one snapshot as the events are considered >> as having been "consumed" at that point. > > Hum, no the behavior is still the same, consecutive snapshots can overlap. >
Ah, alright then. Thanks for the correction! Jérémie > Could you tell us on which architecture you are experiencing the problem > and the number of CPUs ? > > Thanks, > > Julien -- Jérémie Galarneau EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
