On Wed, Jul 24, 2013 at 10:48 AM, Amit Margalit <[email protected]> wrote: > Hi, > > Access beyond end of file seems to be the issue IIUC. > > Let me try to give as much context as I can and answer your questions one by > one: > > Versions > > lttng-ust 2.2.0, lttng-tools 2.2.0, babeltrace - I used git hash > cc26a15ac355d886bb507955fdaec9b218024713 with a patch on babeltrace.i.in. > > [I cannot supply the patch as this requires my employer's (IBM) approval - I > am in the process of obtaining a permanent contribution approval, but until > then...] > > However, I can tell you exactly what it does: > 1. Adds a typedef for 'ssize_t' > 2. Added a call to _bt_ctf_get_decl_from_def() around the argument supplied > to _bt_ctf_get_int_len() > 3. Changed all use of {} in string ".format()" calls to add numbers (e.g. > {0} instead of {}) - as this doesn't work for Python 2.6. > > I am waiting for the contribution approval and then I'll submit a patch. > > To provide a trace sample, I'll need a special approval, which could take > very long. > > System Setup > > Processor: Xeon-family 6-cores (12 if you count HT) > RAM: 24GB of RAM, page size is 4KB. > OS: Linux 2.6.32.12-211_11_4_0 based on an MCP (IBM's SuSE derivative) > build. > > Tracing Setup > > Same system used to generate, collect and read the traces. > 100% user-space. > > Triggering > > This is not readily reproducible. Some context: > > We have an in-house trace mechanism including a code instrumentor and a > curses-based viewer (actually urwid+python). > My current project is to embed LTTng logging and trace capabilites into > several open-source projects, because we couldn't use our in-house tracer > without having GPL issues. > I have written a back-end that enables our in-house viewer to display Lttng > traces, by linking with libbabeltrace (allowed by its LGPL license). > > As far as I can tell SIGBUS happens to me when I was doing a search. This > literally reads the events in sequence and tries to compare some fields > against some criteria. > > And here's the main reason why I think this is happening - I was adding a > trace (using add_traces_recursive(trace_dir,'ctf')) while the session is > active (i.e. not stopped). >
I may be mistaken, but I don't think Babeltrace supports reading growing files. Julien may be working on that as part of the live tracing feature. > I can see 2 problematic scenarios here: > > Scenario 1 > > When I add the trace to the context babeltrace performs mmap() according to > the size of the file(s) at that moment. > The files keep growing. > When I try to access new packets they may lie beyond the end of the mmap'ed > region. I think this should lead to SIGSEGV and not to SIGBUS. > > Scenario 2 > > Due to buffering issues, I could be having a race with the session daemon, > where my code reads the beginning of a packet that was already written to > the file, and try to access the contents, which were not yet written. Again > - could happen around the end of the file, with the session still active. > > Amit > > Amit Margalit > IBM XIV - Storage Reinvented > XIV-NAS Development Team > Tel. 03-689-7774 > Fax. 03-689-7230 > > > > From: Mathieu Desnoyers <[email protected]> > To: Amit Margalit/Israel/IBM@IBMIL > Cc: [email protected] > Date: 07/24/2013 04:20 PM > Subject: Re: [lttng-dev] Getting SIGBUS in babeltrace - occasionally > ________________________________ > > > > * Mathieu Desnoyers ([email protected]) wrote: >> * Amit Margalit ([email protected]) wrote: >> > Hi, >> > >> > I am getting occasional SIGBUS inside _aligned_integer_read() in >> > ./formats/ctf/types/integer.c >> > >> > Here is the backtrace, maybe someone could take a look and suggest >> > something: >> >> It looks like you're hitting the internal checks for overflow on the >> buffer boundaries. > > Please forget about my previous email, I'm utterly wrong. That should > teach me to reply before my first morning coffee ;) > > We indeed have internal checks for overflows in babeltrace, but those > are not triggering SIGBUS ever, they return failure gracefully. > > What seems to happen in your case is documented here: > > mmap(2) manpage: > > SIGBUS Attempted access to a portion of the buffer that does not > corre‐ > spond to the file (for example, beyond the end of the > file, > including the case where another process has truncated > the > file). > > I'm curious to learn which versions of babeltrace/ust/modules you are > using, if you modified them (and how), and if you can get us a sample > trace that triggers the issue. > > Also, a bit of context about your setup: is this 32-bit or 64-bit > user-space, which OS. > > Hrm. > > By any chance, is it possible that you record your trace on a platform > with 4kB pages, and run babeltrace on it on a platform having 64kB > pages? Everything points me to include/babeltrace/mmap-align.h > mmap_align_addr(). We might be mmaping beyond the end of file in this > case. > > Thanks! > > Mathieu > >> >> On which babeltrace version can you reproduce it ? Did you do any >> modification to babeltrace on your own on top ? >> >> Are you getting this with a lttng-ust or lttng-modules trace ? If yes, >> which version of the tool has generated the trace ? >> >> Can you provide a sample trace that triggers this issue ? >> >> Can you give us the detailed sequence of steps you use to reproduce ? >> >> Thanks, >> >> Mathieu >> >> > #0 _aligned_integer_read (definition=0x954320, ppos=0x106e028) at >> > integer.c:81 >> > #1 ctf_integer_read (ppos=0x106e028, definition=0x954320) at >> > integer.c:224 >> > #2 0x00007ffff070e36e in generic_rw (definition=<optimized out>, >> > pos=0x106e028) at ../include/babeltrace/types.h:133 >> > #3 bt_struct_rw (ppos=0x106e028, definition=0x9542e0) at struct.c:56 >> > #4 0x00007ffff13f7fda in generic_rw (definition=<optimized out>, >> > pos=0x106e028) at ../../include/babeltrace/types.h:133 >> > #5 ctf_packet_seek (stream_pos=0x106e028, index=<optimized out>, >> > whence=<optimized out>) at ctf.c:860 >> > #6 0x00007ffff070a3f1 in seek_file_stream_by_timestamp >> > (cfs=cfs@entry=0x106cf90, timestamp=timestamp@entry=1374636821498972926) >> > at iterator.c:141 >> > #7 0x00007ffff070aa96 in seek_ctf_trace_by_timestamp >> > (stream_heap=0x176d310, timestamp=1374636821498972926, tin=<optimized >> > out>) at iterator.c:188 >> > #8 bt_iter_set_pos (iter=iter@entry=0xfdafa0, iter_pos=0x2bf59f0) at >> > iterator.c:439 >> > #9 0x00007ffff1624f01 in _wrap__bt_iter_set_pos (self=<optimized out>, >> > args=<optimized out>) at babeltrace_wrap.c:3805 >> > #10 0x00007ffff7b28c0c in call_function (oparg=<optimized out>, >> > pp_stack=<optimized out>) at Python/ceval.c:3679 >> > #11 PyEval_EvalFrameEx (f=0x1c243c0, throwflag=<optimized out>) at >> > Python/ceval.c:2370 >> > #12 0x00007ffff7b2e0d2 in PyEval_EvalCodeEx (co=0xa8e5d0, >> > globals=<optimized out>, locals=<optimized out>, args=0x2088cc8, >> > argcount=0, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) >> > at Python/ceval.c:2942 >> > . >> > . >> > . >> > >> > Any suggestions are welcome. >> > >> > Amit Margalit >> > IBM XIV - Storage Reinvented >> > XIV-NAS Development Team >> > Tel. 03-689-7774 >> > Fax. 03-689-7230 >> > _______________________________________________ >> > lttng-dev mailing list >> > [email protected] >> > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev >> >> >> -- >> Mathieu Desnoyers >> EfficiOS Inc. >> http://www.efficios.com > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com > > > > _______________________________________________ > lttng-dev mailing list > [email protected] > http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev > -- Jérémie Galarneau EfficiOS Inc. http://www.efficios.com _______________________________________________ lttng-dev mailing list [email protected] http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
