Hey Ryan,

Thanks for providing the core file to me offline. This turns out to be a
relatively simple bug in libdtrace:

> $C
ffffffff7f7ff6b1 libc.so.1`_ndoprnt+4(ffffffff7f174760, ffffffff7f800e58,
ffffffff7f800d38, 0, ff00, 2f000000)
ffffffff7f800481 libc.so.1`snprintf+0x88(ffffffff7f801690, 0, ffffffff7f174760,
ffffffff7f800e9c, ffffffff7f801290, c4)
ffffffff7f8005c1 libdtrace.so.1`dtrace_uaddr2str+0x108(1001117e0, 6106, 
100521a6c, ffffffff7f801f40, 10d, ffffffff7f801690)
ffffffff7f801691 libdtrace.so.1`pfprint_uaddr+0xd8(1001117e0, 10010b570,
ffffffff7fffde54, 6106, 10d, 10)
ffffffff7fffd4d1 libdtrace.so.1`dt_printf_format+0x6f4(ffffffff7fffde55, 
100577ed8, 100114d40, ffffffff7fffde55, ffffffff7f276000, ffffffff7f1584bc)
ffffffff7fffd6b1 libdtrace.so.1`dt_fprinta+0x88(0, ffffffff7fffe108, 100114d40, 
4, 100577e70, 100577e98)
ffffffff7fffd791 libdtrace.so.1`dt_aggregate_walk_sorted+0x108(1001117e0,
ffffffff7f15af84, ffffffff7fffe108, 4c50, 98a, 1013ddb20)
ffffffff7fffd841 libdtrace.so.1`dtrace_fprinta+0x11c(1001117e0, 10010b570, 
100114d40, 1, 10010dac8, 1)
ffffffff7fffd941 libdtrace.so.1`dt_consume_cpu+0x78c(1001117e0, 10010b570, 0,
ffffffff7f276000, 1, 1009949b8)
ffffffff7fffdb41 libdtrace.so.1`dtrace_consume+0x2f8(1001117e0, 10010b570, 
1000044f8, 10000446c, c0, 0)
ffffffff7fffdc01 libdtrace.so.1`dtrace_work+0x108(1001117e0, 10010b570, 
1000044f8, 10000446c, 0, 1)
ffffffff7fffdcb1 main+0x1fb0(10010b, 10010b390, 10010b378, 100000, 10010b, 
100000)
ffffffff7fffdee1 _start+0x17c(0, 0, 0, 0, 0, 0)

Note how big the frame for dt_printf_format() is:
> ffffffff7fffd4d1-ffffffff7f801691=K
                7fbe40          
> (ffffffff7fffd4d1-ffffffff7f801691)%0t1024=D
                8175            

That's an 8MB stack frame!

Here's the problem in dt_printf_format():

        for (i = 0; i < pfv->pfv_argc; i++, pfd = pfd->pfd_next) {
                ...
                if (pfd->pfd_preflen != 0) {
                        char *tmp = alloca(pfd->pfd_preflen + 1);

                        bcopy(pfd->pfd_prefix, tmp, pfd->pfd_preflen);
                        tmp[pfd->pfd_preflen] = '\0';

                        if ((rval = dt_printf(dtp, fp, tmp)) < 0)
                                return (rval);
                        ...

We're allocating unbounded amounts of data off the stack. You may be able to
work around the problem by using ulimit(1) to increase the size of dtrace(1M)'s
stack.

I've filed the following bug:

  6969970 dt_printf_format() calls alloca(3C) in a loop

Adam

On Fri, Jul 16, 2010 at 05:41:51PM +0200, Ryan Johnson wrote:
> Update: the bug manifests quite often no matter what aggsize I use. I just 
> got lucky that one time...
>
> Has anyone else seen this?
>
> Ryan
>
> On 7/15/2010 3:51 PM, Ryan Johnson wrote:
>> Hi all,
>>
>> Odd behavior here: I'm using the profile probe to sample stack traces in 
>> my app, 100+ threads banging hard on a T5440. The script exits after 30 
>> seconds of profiling and seg faults while trying to print the aggregated 
>> stack trace counts on exit:
>>> $ dbx - core
>>> Corefile specified executable: "/usr/sbin/sparcv9/dtrace"
>>> For information about new features see `help changes'
>>> To remove this message, put `dbxenv suppress_startup_message 7.8' in your 
>>> .dbxrc
>>> Reading /usr/sbin/sparcv9/dtrace
>>> core file header read successfully
>> <snipped long list of .so files>
>>> program terminated by signal SEGV (no mapping at the fault address)
>>> 0xffffffff7e8a8ff0: _ndoprnt+0x0004:    stx      %i2, [%sp + 2223]
>>> (dbx) bt
>>> =>[1] _ndoprnt()
>>>   [2] snprintf()
>>>   [3] dtrace_uaddr2str()
>>>   [4] pfprint_uaddr()
>>>   [5] dt_printf_format()
>>>   [6] dt_fprinta()
>>>   [7] dt_aggregate_walk_sorted()
>>>   [8] dtrace_fprinta()
>>>   [9] dt_consume_cpu()
>>>   [10] dtrace_consume()
>>>   [11] dtrace_work()
>>>   [12] main()
>>
>> The problem seems related to the fact that I had specified 'dtrace -x 
>> aggsize=256k' -- rerunning with aggsize=2048k makes the problem go away. 
>> Note that I didn't see any messages from dtrace about drops, so I can't 
>> say whether 256k should have been enough or not. However, the app has a 
>> lot of code and deep call stacks, so I could easily imagine 256k not being 
>> enough.
>>
>> Ideas? Should I file a bug report?
>>
>> Ryan
>>
>> _______________________________________________
>> dtrace-discuss mailing list
>> dtrace-discuss@opensolaris.org
>
> _______________________________________________
> dtrace-discuss mailing list
> dtrace-discuss@opensolaris.org

-- 
Adam Leventhal, Fishworks                     http://blogs.sun.com/ahl
_______________________________________________
dtrace-discuss mailing list
dtrace-discuss@opensolaris.org

Reply via email to