> -----Original Message----- > From: Jonathan Rajotte-Julien [mailto:jonathan.rajotte-jul...@efficios.com] > Sent: Tuesday, March 13, 2018 4:35 PM > To: THEUNISSEN Rolf <rolf.theunis...@altran.com> > Cc: lttng-dev@lists.lttng.org > Subject: Re: [lttng-dev] [lttng-ust] Missing events when just before a process > exits > > Hi Rolf, > > On Tue, Mar 13, 2018 at 10:48:17AM +0000, THEUNISSEN Rolf wrote: > > Hi, > > > > I am currently tracing many processes with LTTng-UST on a system under > heavy load, with 2 CPUs. > > > > The traces seem to be missing ust-events near the end of a trace. After > doing some analysis, it seems that events are missing from the other CPU > then the process exits on. More concretely: > > > > During the lifetime of a process: > > - Process X executes on CPU 0,1: all events are in the trace Just > > before the process is about to exit: > > - Process X executes on CPU 0: missing events from the trace > > How are you validating this?
I am also using kernel tracing, to trace the scheduler. In TraceCompass I can view the states of the processes and CPUs. > > > A few mili-seconds later: > > - Process X executes on CPU1 and exits: The last events are in the > > trace > > This is not expected. Do you have a reproducer so we can check what might > be happening? I am not allowed to provide details about my current traces. And it is hard to make a small example, but I got some more clues what is happening, see below. > > We will need the version for lttng-ust, lttng-tools, babeltrace. I am using version 2.10, for analyzing the traces I am using TraceCompass Kernel version: 3.10.62 > > Note that you can run your application with lttng-ust in debug mode using > LTTNG_UST_DEBUG=y. Debug statements will be outputted on stderr. > > e.g: > $ LTTNG_UST_DEBUG=y ./my_sample_app > Enableling this debugging flag gave me a clue what is happening. For all processes, the registration is going OK. But for some processes, the un-registration is not logged. On closer analysis (now with tracepoints from SystemTap), I see that the processes for which the events are missing and the un-registration is missing are killed by SIGKILL signal. This probably explains why some data can be lost. But maybe strange that only messages get lost that were traced on a different CPU. I remain puzzled why the SIGKILL arrives at the process, as a SIGTERM is sent to the process. Somehow the kernel decides to deliver an SIGKILL instead of a SIGTERM. Rolf > Cheers > > -- > Jonathan Rajotte-Julien > EfficiOS _______________________________________________ lttng-dev mailing list lttng-dev@lists.lttng.org https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev