Hi all,

We have encountered a problem with using LTTng-UST tracing with our 
application, where on a particular VMware vCenter cluster we almost ways get 
segfaults when tracepoints are enabled, whereas on another vCenter cluster, and 
on every other machine we've ever used, we don't hit this problem.

I can reproduce this using lttng-ust/tests/hello after using:

"""
lttng create
lttng enable-channel channel0 --userspace
lttng add-context --userspace -t vpid -t vtid -t procname
lttng enable-event --userspace "ust_tests_hello:*" -c channel0
lttng start
"""

In which case I get the following stack trace with an obvious NULL pointer 
dereference:

"""
Program terminated with signal SIGSEGV, Segmentation fault.
#0  v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48
48              return uatomic_read(&v_a->a);
[...]
#0  v_read (config=<optimized out>, v_a=0x0) at vatomic.h:48
#1  0x00007f4aa10a4804 in lib_ring_buffer_try_reserve_slow (
    buf=0x7f4a98008a00, chan=0x7f4a98008a00, offsets=0x7fffef67c620,
    ctx=0x7fffef67ca40) at ring_buffer_frontend.c:1677
#2  0x00007f4aa10a6c9f in lib_ring_buffer_reserve_slow (ctx=0x7fffef67ca40)
    at ring_buffer_frontend.c:1819
#3  0x00007f4aa1095b75 in lib_ring_buffer_reserve (ctx=0x7fffef67ca40,
    config=0x7f4aa12b8ae0 <client_config>)
    at ../libringbuffer/frontend_api.h:211
#4  lttng_event_reserve (ctx=0x7fffef67ca40, event_id=0)
    at lttng-ring-buffer-client.h:473
#5  0x000000000040135f in __event_probe__ust_tests_hello___tptest (
    __tp_data=0xed3410, anint=0, netint=0, values=0x7fffef67cb50,
    text=0x7fffef67cb70 "test", textlen=<optimized out>, doublearg=2,
    floatarg=2222, boolarg=true) at ././ust_tests_hello.h:32
#6  0x0000000000400d2c in __tracepoint_cb_ust_tests_hello___tptest (
    boolarg=true, floatarg=2222, doublearg=2, textlen=4,
    text=0x7fffef67cb70 "test", values=0x7fffef67cb50,
    netint=<optimized out>, anint=0) at ust_tests_hello.h:32
#7  main (argc=<optimized out>, argv=<optimized out>) at hello.c:92
"""

I hit this segfault 10 out of 10 times I ran "hello" on a VM on one vCenter and 
0 out of 10 times I ran it on the other, and the VMs otherwise had the same 
software installed on them:

- CentOS 6-based
- kernel-2.6.32-504.1.3.el6 with some minor changes made in networking
- userspace-rcu-0.8.3, lttng-ust-2.3.2 and lttng-tools-2.3.2 which might have 
some minor patches backported, and leftovers of changes to get them to build on 
CentOS 5

On the "good" vCenter, I tested on two different VM hosts:

Processor Type: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
EVC Mode: Intel(R) "Nehalem" Generation
Image Profile: (Updated) ESXi-5.1.0-799733-standard

Processor Type: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
EVC Mode: Intel(R) "Nehalem" Generation
Image Profile: (Updated) ESXi-5.1.0-799733-standard

The "bad" vCenter VM host that I tested on had this configuration:

ESX Version: VMware ESXi, 5.0.0, 469512
Processor Type: Intel(R) Xeon(R) CPU X7550 @ 2.00GHz

Any ideas?

Thanks in advance,
David

----------------------------------------------------------------------
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to