Hi, On 14.06.2017 18:39, Mathieu Desnoyers wrote: >> On 14.06.2017 17:12, Mathieu Desnoyers wrote: >>> Can you provide a copy of the metadata file ? And ideally the data >>> streams too ? This would give us a better idea of what is happening. >>> >>> Do you perform kernel or user-space tracing ? Do you trace huge >>> sequences of bytes within your own tracepoints ? >> I perform kernel traceing only, in this case limited to syscalls, >> sched*, block* and irq*. No user-space tracepoints. >> >> I didn't know the metadata file was plain text, I had a quick look into >> it and noticed corruption already, with random garbage data inserted all >> over the place. I'm surprised babeltrace didn't choke on the metadata >> already. > The lttng metadata is "packetized plain-text". What you see is plain-text in > a transport layer which is binary. This explains the "garbage" you see: > those are binary headers for packets. Use babeltrace -o ctf-metadata > to extract the text-only metadata (which is also valid metadata under CTF). > Both packetized and pure text metadata are allowed.
Upps, right, red herring then. I've read the overview of the CTF format
now (great docs) and understand things a bit better.
>> Any idea what can cause the corrupted trace?
> Based on your babeltrace backtrace, the possible culprits would be the
> events that have a sequence (variable-sized array):
>
> syscalls: select, poll, ppoll, pselect6, epoll_wait, epoll_pwait
>
> block_rq_issue, block_rq_insert, block_rq_complete, block_rq_requeue,
> block_rq_abort.
>
> There are a few approaches to cornering the issue. You can try reproducing
> on your workload/config by only enabling one of these events at a time.
> Just knowing which event(s) is/are the culprit would be a good start.
>
> Another possibility would be to send us a trace reproducing the issue
> with only those events enabled, which should not contain confidential
> info about your system.
I've added some debug statements to babeltrace now. The culprit in this
particular case is the first block_rq_complete event, the __cmd_length
field contains a large value (3040877592). __cmd_length is used as the
length for the _cmd sequence, and then of course allocating space for
that sequence fails.
Any idea what can cause __cmd_length to be bogus?
For reference, this is the event declaration:
event {
name = "block_rq_complete";
id = 4;
stream_
fields := struct {
integer { size = 32; align = 32; signed = 0; encoding = none;
base =
10; } _dev;
integer { size = 64; align = 64; signed = 0; encoding = none;
base =
10; } _sector;
integer { size = 32; align = 32; signed = 0; encoding = none;
base =
10; } _nr_sector;
integer { size = 32; align = 32; signed = 1; encoding = none;
base =
10; } _errors;
integer { size = 32; align = 32; signed = 0; encoding = none;
base =
10; } _rwbs;
integer { size = 32; align = 32; signed = 0; encoding = none;
base =
10; } __cmd_length;
integer { size = 8; align = 8; signed = 0; encoding = none;
base = 16;
} _cmd[ __cmd_length ];
};
};
Also, does anyone know which block_rq_ events are actually needed to
display the IO load in TraceCompass?
Regards,
Thomas
--
Thomas McGuire | [email protected] | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ lttng-dev mailing list [email protected] https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
