Hi Lassi,
Will get back to you on your questions soon.

Thanks.

On Mon, Sep 5, 2011 at 7:40 PM, Lassi Tuura <[email protected]> wrote:

> Hi,
>
> > Here is the malloc example
> > void *malloc(size_t size)
> > {
> >     void *p;
> >
> >     if(!origMallocFp)
> >         getInstance(); => This is done with pthread_once.
>
> If I understand your quote correctly, this may end up calling dlsym(),
> which may internally call malloc(). You are not really pasting enough code
> here to tell for sure that you code is problem free; it's hard to reason
> about the code based on the information at hand. You might want to review
> your code with a very critical eye on all calls.
>
> >         formStackPacket (packet+pktHdrLen, (unsigned int *)bt,
> numEntries);
> >         rssSend (packet, sizeof (unsigned int) * (numEntries +
> pktHdrLen));
>
> Black boxes, hard to say what they do. Could they allocate memory or
> otherwise end in trouble?
>
> It could be something as simple as these bits have an error path which gets
> fired when you send more data with the full stack trace, and the error path
> does some memory allocation. Without stack tracing you might never hit the
> error path.
>
> > Yes, i experienced this when i first tried with glibc backtrace() and
> also printfs when i first started.
> > Hence i removed all that and this works fine. For days together i can
> profile the app and get the stats.
> > Without stack trace, this is only half the job done and teams take longer
> to find the exact place of leak :-(
>
> Unfortunately that doesn't say much. You could just be lucky and not call
> anything which triggers problems. For example if you add stacks to your
> network stuff, maybe it exceeds some threshold and does some allocation, or
> hits an error path you don't otherwise trigger, or ...?
>
> > Also, when i don't link with -lunwind, the code is stable. I have tried
> with different versions of the app and it is consistent.
> > So there is no recursive malloc hazard without unwind for sure.
>
> It's a data point, but could just be circumstantial. It's hard to say for
> sure from data.
>
> > Great. Did you use LD_PRELOAD trick? It is so appealing because of it's
> ease of instrumentation.
>
> No, we inject a hook into functions by rewriting the function prologue on
> the fly.
>
> >  That's why i am not giving up yet to get the backtrace. The target is a
> small device with nand based filesystem and cannot hold huge data. Hence i
> send it to host for post processing.
>
> Let me throw a few ideas here, though extensive follow-up would probably
> better be off the libunwind list.
>
> On x86-64 we use libunwind to capture stack trace (ia32 uses something
> else) on every allocation. Each allocation is associated to its full stack
> trace, and we can dump this "heap snapshot" at any time during running, or
> at the end as a final profile result. We use these for leak checking,
> identifying peak use, general allocation profiling, correlating performance
> and allocation behaviour, looking for churn, delta comparisons between
> runs/versions, fragmentation and locality studies, etc. The heap snapshots
> are many orders of magnitude smaller than the entire stream of stack traces
> on allocation would be.
>
> The applications we profile generate prodigious number of allocation
> samples, on average 40 levels deep stacks from 700 or so shared libraries,
> 1-3 million times a second. It's not unusual we track ~7-10 million
> concurrently live allocations. The apps run anywhere from ~15 minutes to 24
> hours.
>
> Long long time ago we use to generate a serialised stream of stack traces,
> like you appear to do, then absorb it in a collector to summary. We moved
> away from doing that because there was no way to deal with the data stream
> at the rate it was produced, even if the consumer was multi-threaded and
> used numerous tricks to speed up consuming the stack trace data. But maybe
> your data rate isn't as high as ours...
>
> We've settled on a data structure which is moderate enough in extra size (=
> needs <100% extra virtual memory) and is fast enough to update (~140% run
> time increase at 1MHz, vs. x10-20 for valgrind), and handles multi-threaded
> apps too. If allocation rate is less fanatic, the overhead is less, much
> less. The heap snapshots are very manageable size, about 30MB compressed per
> 1-2 GB of VSIZE.
>
> I don't know what sort of constraints you have on your target device, or
> what your target app's behaviour is, but my experience was that summarising
> the allocation data in-process virtual memory was by far the winner. YMMV,
> much depends on how much extra RAM you can expend, and what sort of
> allocation rate you experience, and other factors.
>
> Regards,
> Lassi
>
>
_______________________________________________
Libunwind-devel mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/libunwind-devel

Reply via email to