Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-28 Thread Thomas McGuire
Hi,

On 27.06.2017 21:09, Mathieu Desnoyers wrote:
> We would need links to the actual tarballs/build scripts and patches of
> your specific distribution, for the version you use. The ptxdist packages
> on http://www.pengutronix.de/software/ptxdist/ appear to target kernel 3.10,
> but yours seems to be based on 4.9.

We are using an unpatched ptxdist-2017.05.0, it does seem to support
kernel 4.9 by setting the KERNEL_VERSION config option.
I'm sorry to say that I can't provide the tarball, patches and build
scripts.

> Without this information, there is very little we can do.

Ok, I understand. Thank you for helping so far. I learned a bit about
lttng, and narrowed down the problem to the block_rq_complete event,
which I can simply disable. The schedule events all work, and the
schedule timeline is the most important bit for me and quite useful already.

> You should probably ask your distribution vendor to look into this as well.

Right, that's worth a try.

Thanks,
Thomas
-- 
Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts



smime.p7s
Description: S/MIME Cryptographic Signature
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-28 Thread Thomas McGuire
Hi,

On 27.06.2017 16:25, Michael Jeanson wrote:
> I forgot to ask for your kernel .config file.

Attached is the content of /proc/config.gz.

Regards,
Thomas
-- 
Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts
#
# Automatically generated file; DO NOT EDIT.
# Linux/arm 4.9.28-20170428-1 Kernel Configuration
#
CONFIG_ARM=y
CONFIG_ARM_HAS_SG_CHAIN=y
CONFIG_MIGHT_HAVE_PCI=y
CONFIG_SYS_SUPPORTS_APM_EMULATION=y
CONFIG_HAVE_PROC_CPU=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_VECTORS_BASE=0x
CONFIG_ARM_PATCH_PHYS_VIRT=y
CONFIG_GENERIC_BUG=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
CONFIG_CROSS_MEMORY_ATTACH=y
CONFIG_FHANDLE=y
CONFIG_USELIB=y
# CONFIG_AUDIT is not set
CONFIG_HAVE_ARCH_AUDITSYSCALL=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_SHOW_LEVEL=y
CONFIG_HARDIRQS_SW_RESEND=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_HANDLE_DOMAIN_IRQ=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_GENERIC_CLOCKEVENTS=y

#
# Timers subsystem
#
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_VIRT_CPU_ACCOUNTING_GEN is not set
# CONFIG_IRQ_TIME_ACCOUNTING is not set
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_NMI_LOG_BUF_SHIFT=13
CONFIG_GENERIC_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_MEMCG is not set
# CONFIG_BLK_CGROUP is not set
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
# CONFIG_CFS_BANDWIDTH is not set
# CONFIG_RT_GROUP_SCHED is not set
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_FREEZER is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_CGROUP_PERF is not set
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_NAMESPACES=y
CONFIG_UTS_NS=y
CONFIG_IPC_NS=y
# CONFIG_USER_NS is not set
CONFIG_PID_NS=y
CONFIG_NET_NS=y
# CONFIG_SCHED_AUTOGROUP is not set
# CONFIG_SYSFS_DEPRECATED is not set
CONFIG_RELAY=y
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_BPF=y
CONFIG_EXPERT=y
CONFIG_UID16=y
CONFIG_MULTIUSER=y
# CONFIG_SGETMASK_SYSCALL is not set
CONFIG_SYSFS_SYSCALL=y
# CONFIG_SYSCTL_SYSCALL is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_ALL is not set
# CONFIG_KALLSYMS_ABSOLUTE_PERCPU is not set
CONFIG_KALLSYMS_BASE_RELATIVE=y
CONFIG_PRINTK=y
CONFIG_PRINTK_NMI=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
# CONFIG_BPF_SYSCALL is not set
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_ADVISE_SYSCALLS=y
# CONFIG_USERFAULTFD is not set
CONFIG_MEMBARRIER=y
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y
CONFIG_PERF_USE_VMALLOC=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# CONFIG_VM_EVENT_COUNTERS is not set
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
# CONFIG_SLUB is not set
# CONFIG_SLOB is not set
# CONFIG_SLAB_FREELIST_RANDOM is not set
# CONFIG_SYSTEM_DATA_VERIFICATION is not set
# CONFIG_PROFILING is not set
CONFIG_TRACEPOINTS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
# CONFIG_JUMP_LABEL is not set
CONFIG_UPROBES=y
# CONFIG_HAVE_64BIT_ALIGNED_ACCESS is not set
CONFIG_ARCH_USE_BUILTIN_BSWAP=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_OPTPROBES=y
CONFIG_HAVE_NMI=y
CONFIG_HAVE_ARCH_TRACEHOOK=y

Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-27 Thread Mathieu Desnoyers

- On Jun 26, 2017, at 5:55 AM, Thomas McGuire thomas.mcgu...@kdab.com wrote:

> Hi Michael,
> 
> On 23.06.2017 21:55, Michael Jeanson wrote:
>> I see from the metadata file you provided that your kernel version is
>> 4.9.28-20170428-1, is it built from vanilla kernel sources? If not,
>> could you point us to a git repo or source archive? It would help a lot
>> to figure this out.
> 
> The Linux system is built with ptxdist, which is similar to Yocto. I'm
> not very familiar with ptxdist.
> During the build process, it fetches
> https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz.
> There are 25 patches in patches/linux-4.9/ in the ptxdist root dir,
> which I assume are all the patches that are applied.
> 
> None of the patches touch the block subsystem, and only one patch
> touches include/trace/, adding a new siox tracepoint. One patch sets
> "20170428-1" as the EXTRAVERSION. I don't know where SUBLEVEL is set to 28.
> 
> include/trace/events/block.h is the same as
> https://github.com/torvalds/linux/blob/v4.9/include/trace/events/block.h.
> 
> That's all relevant information about the kernel I could figure out so far.
> 
> Any other information that would help?

Hi Thomas,

We would need links to the actual tarballs/build scripts and patches of
your specific distribution, for the version you use. The ptxdist packages
on http://www.pengutronix.de/software/ptxdist/ appear to target kernel 3.10,
but yours seems to be based on 4.9.

Without this information, there is very little we can do.

You should probably ask your distribution vendor to look into this as well.

Thanks,

Mathieu

> 
> Regards,
> Thomas
> --
> Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
> 
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-27 Thread Michael Jeanson
On 2017-06-26 05:55, Thomas McGuire wrote:
> Hi Michael,
> 
> On 23.06.2017 21:55, Michael Jeanson wrote:
>> I see from the metadata file you provided that your kernel version is
>> 4.9.28-20170428-1, is it built from vanilla kernel sources? If not,
>> could you point us to a git repo or source archive? It would help a lot
>> to figure this out.
> 
> The Linux system is built with ptxdist, which is similar to Yocto. I'm
> not very familiar with ptxdist.
> During the build process, it fetches
> https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz.
> There are 25 patches in patches/linux-4.9/ in the ptxdist root dir,
> which I assume are all the patches that are applied.
> 
> None of the patches touch the block subsystem, and only one patch
> touches include/trace/, adding a new siox tracepoint. One patch sets
> "20170428-1" as the EXTRAVERSION. I don't know where SUBLEVEL is set to 28.
> 
> include/trace/events/block.h is the same as
> https://github.com/torvalds/linux/blob/v4.9/include/trace/events/block.h.
> 
> That's all relevant information about the kernel I could figure out so far.
> 
> Any other information that would help?
> 
> Regards,
> Thomas

Hi Thomas,

I forgot to ask for your kernel .config file.

Michael
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-26 Thread Thomas McGuire
Hi Michael,

On 23.06.2017 21:55, Michael Jeanson wrote:
> I see from the metadata file you provided that your kernel version is
> 4.9.28-20170428-1, is it built from vanilla kernel sources? If not,
> could you point us to a git repo or source archive? It would help a lot
> to figure this out.

The Linux system is built with ptxdist, which is similar to Yocto. I'm
not very familiar with ptxdist.
During the build process, it fetches
https://www.kernel.org/pub/linux/kernel/v4.x/linux-4.9.tar.xz.
There are 25 patches in patches/linux-4.9/ in the ptxdist root dir,
which I assume are all the patches that are applied.

None of the patches touch the block subsystem, and only one patch
touches include/trace/, adding a new siox tracepoint. One patch sets
"20170428-1" as the EXTRAVERSION. I don't know where SUBLEVEL is set to 28.

include/trace/events/block.h is the same as
https://github.com/torvalds/linux/blob/v4.9/include/trace/events/block.h.

That's all relevant information about the kernel I could figure out so far.

Any other information that would help?

Regards,
Thomas
-- 
Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts



smime.p7s
Description: S/MIME Cryptographic Signature
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-23 Thread Thomas McGuire
Hi,

On 14.06.2017 18:39, Mathieu Desnoyers wrote:
>> On 14.06.2017 17:12, Mathieu Desnoyers wrote:
>>> Can you provide a copy of the metadata file ? And ideally the data
>>> streams too ? This would give us a better idea of what is happening.
>>>
>>> Do you perform kernel or user-space tracing ? Do you trace huge
>>> sequences of bytes within your own tracepoints ?
>> I perform kernel traceing only, in this case limited to syscalls,
>> sched*, block* and irq*. No user-space tracepoints.
>>
>> I didn't know the metadata file was plain text, I had a quick look into
>> it and noticed corruption already, with random garbage data inserted all
>> over the place. I'm surprised babeltrace didn't choke on the metadata
>> already.
> The lttng metadata is "packetized plain-text". What you see is plain-text in
> a transport layer which is binary. This explains the "garbage" you see:
> those are binary headers for packets. Use babeltrace -o ctf-metadata
> to extract the text-only metadata (which is also valid metadata under CTF).
> Both packetized and pure text metadata are allowed.

Upps, right, red herring then. I've read the overview of the CTF format
now (great docs) and understand things a bit better.

>> Any idea what can cause the corrupted trace?
> Based on your babeltrace backtrace, the possible culprits would be the
> events that have a sequence (variable-sized array):
> 
> syscalls: select, poll, ppoll, pselect6, epoll_wait, epoll_pwait
> 
> block_rq_issue, block_rq_insert, block_rq_complete, block_rq_requeue, 
> block_rq_abort.
> 
> There are a few approaches to cornering the issue. You can try reproducing
> on your workload/config by only enabling one of these events at a time.
> Just knowing which event(s) is/are the culprit would be a good start.
> 
> Another possibility would be to send us a trace reproducing the issue
> with only those events enabled, which should not contain confidential
> info about your system.

I've added some debug statements to babeltrace now. The culprit in this
particular case is the first block_rq_complete event, the __cmd_length
field contains a large value (3040877592). __cmd_length is used as the
length for the _cmd sequence, and then of course allocating space for
that sequence fails.

Any idea what can cause __cmd_length to be bogus?

For reference, this is the event declaration:
event {
name = "block_rq_complete";
id = 4;
stream_
fields := struct {
integer { size = 32; align = 32; signed = 0; encoding = none; 
base =
10; } _dev;
integer { size = 64; align = 64; signed = 0; encoding = none; 
base =
10; } _sector;
integer { size = 32; align = 32; signed = 0; encoding = none; 
base =
10; } _nr_sector;
integer { size = 32; align = 32; signed = 1; encoding = none; 
base =
10; } _errors;
integer { size = 32; align = 32; signed = 0; encoding = none; 
base =
10; } _rwbs;
integer { size = 32; align = 32; signed = 0; encoding = none; 
base =
10; } __cmd_length;
integer { size = 8; align = 8; signed = 0; encoding = none; 
base = 16;
} _cmd[ __cmd_length ];
};
};

Also, does anyone know which block_rq_ events are actually needed to
display the IO load in TraceCompass?

Regards,
Thomas
-- 
Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts



smime.p7s
Description: S/MIME Cryptographic Signature
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-14 Thread Thomas McGuire
Hi,

On 14.06.2017 17:12, Mathieu Desnoyers wrote:
> Can you provide a copy of the metadata file ? And ideally the data
> streams too ? This would give us a better idea of what is happening.
> 
> Do you perform kernel or user-space tracing ? Do you trace huge
> sequences of bytes within your own tracepoints ?

I perform kernel traceing only, in this case limited to syscalls,
sched*, block* and irq*. No user-space tracepoints.

I didn't know the metadata file was plain text, I had a quick look into
it and noticed corruption already, with random garbage data inserted all
over the place. I'm surprised babeltrace didn't choke on the metadata
already.
I can not provide the data file as it has confidential data. Looking at
it with a hex editor, I see the same kind of garbage as in the metadata
file, so both files are affected by the same problem.

I've uploaded the metadata file to
http://www.kdab.com/~thomas/stuff/metadata.

To double-check that it isn't file system corruption, I ran "yes >
test.data" - that file is OK, so it's probably a different problem.

Any idea what can cause the corrupted trace?

Regards,
Thomas
-- 
Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts



smime.p7s
Description: S/MIME Cryptographic Signature
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev


Re: [lttng-dev] Allocation failures with babeltrace and TraceCompass - corrupt trace?

2017-06-14 Thread Mathieu Desnoyers
- On Jun 14, 2017, at 4:43 AM, Thomas McGuire thomas.mcgu...@kdab.com wrote:

> Hi,
> 
> I have a recurring problem of tools like babeltrace and TraceCompass
> exiting because of an allocation failure.
> 
> For example running babeltrace on a captured trace leads to babeltrace
> processing some events, and then exiting with:
>  GLib-ERROR **: gmem.c:168: failed to allocate 17179869184 bytes
> 
> This sounds like a corrupted trace file to me. Or maybe a version or
> format mismatch?
> 
> In case it helps, babeltrace's backtrace at that point is attached to
> the mail.
> 
> I'm using lttng-relayd 2.9.4 on the host, and lttng 2.9.4 on an i.MX25
> device. Babeltrace is at version 1.5.1.
> The same problem can be seen when not using lttng-relayd.
> 
> I've also seen other problems like "[error] Event id 3336003216 is
> outside range."
> 
> lttng-sessiond shows no warnings or errors during recording, and neither
> does lttng start or stop - not even lost events.
> 
> Any idea how the trace file got corrupted and how I can record traces
> without corruption in the future?

Can you provide a copy of the metadata file ? And ideally the data
streams too ? This would give us a better idea of what is happening.

Do you perform kernel or user-space tracing ? Do you trace huge
sequences of bytes within your own tracepoints ?

Thanks,

Mathieu

> 
> Thanks,
> Thomas
> --
> Thomas McGuire | thomas.mcgu...@kdab.com | Senior Software Engineer
> KDAB (Deutschland) GmbH KG, a KDAB Group company
> Tel: +49-30-521325470
> KDAB - The Qt Experts
> 
> ___
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
___
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev