Hi Lassi, Will get back to you on your questions soon. Thanks.
On Mon, Sep 5, 2011 at 7:40 PM, Lassi Tuura <[email protected]> wrote: > Hi, > > > Here is the malloc example > > void *malloc(size_t size) > > { > > void *p; > > > > if(!origMallocFp) > > getInstance(); => This is done with pthread_once. > > If I understand your quote correctly, this may end up calling dlsym(), > which may internally call malloc(). You are not really pasting enough code > here to tell for sure that you code is problem free; it's hard to reason > about the code based on the information at hand. You might want to review > your code with a very critical eye on all calls. > > > formStackPacket (packet+pktHdrLen, (unsigned int *)bt, > numEntries); > > rssSend (packet, sizeof (unsigned int) * (numEntries + > pktHdrLen)); > > Black boxes, hard to say what they do. Could they allocate memory or > otherwise end in trouble? > > It could be something as simple as these bits have an error path which gets > fired when you send more data with the full stack trace, and the error path > does some memory allocation. Without stack tracing you might never hit the > error path. > > > Yes, i experienced this when i first tried with glibc backtrace() and > also printfs when i first started. > > Hence i removed all that and this works fine. For days together i can > profile the app and get the stats. > > Without stack trace, this is only half the job done and teams take longer > to find the exact place of leak :-( > > Unfortunately that doesn't say much. You could just be lucky and not call > anything which triggers problems. For example if you add stacks to your > network stuff, maybe it exceeds some threshold and does some allocation, or > hits an error path you don't otherwise trigger, or ...? > > > Also, when i don't link with -lunwind, the code is stable. I have tried > with different versions of the app and it is consistent. > > So there is no recursive malloc hazard without unwind for sure. > > It's a data point, but could just be circumstantial. It's hard to say for > sure from data. > > > Great. Did you use LD_PRELOAD trick? It is so appealing because of it's > ease of instrumentation. > > No, we inject a hook into functions by rewriting the function prologue on > the fly. > > > That's why i am not giving up yet to get the backtrace. The target is a > small device with nand based filesystem and cannot hold huge data. Hence i > send it to host for post processing. > > Let me throw a few ideas here, though extensive follow-up would probably > better be off the libunwind list. > > On x86-64 we use libunwind to capture stack trace (ia32 uses something > else) on every allocation. Each allocation is associated to its full stack > trace, and we can dump this "heap snapshot" at any time during running, or > at the end as a final profile result. We use these for leak checking, > identifying peak use, general allocation profiling, correlating performance > and allocation behaviour, looking for churn, delta comparisons between > runs/versions, fragmentation and locality studies, etc. The heap snapshots > are many orders of magnitude smaller than the entire stream of stack traces > on allocation would be. > > The applications we profile generate prodigious number of allocation > samples, on average 40 levels deep stacks from 700 or so shared libraries, > 1-3 million times a second. It's not unusual we track ~7-10 million > concurrently live allocations. The apps run anywhere from ~15 minutes to 24 > hours. > > Long long time ago we use to generate a serialised stream of stack traces, > like you appear to do, then absorb it in a collector to summary. We moved > away from doing that because there was no way to deal with the data stream > at the rate it was produced, even if the consumer was multi-threaded and > used numerous tricks to speed up consuming the stack trace data. But maybe > your data rate isn't as high as ours... > > We've settled on a data structure which is moderate enough in extra size (= > needs <100% extra virtual memory) and is fast enough to update (~140% run > time increase at 1MHz, vs. x10-20 for valgrind), and handles multi-threaded > apps too. If allocation rate is less fanatic, the overhead is less, much > less. The heap snapshots are very manageable size, about 30MB compressed per > 1-2 GB of VSIZE. > > I don't know what sort of constraints you have on your target device, or > what your target app's behaviour is, but my experience was that summarising > the allocation data in-process virtual memory was by far the winner. YMMV, > much depends on how much extra RAM you can expend, and what sort of > allocation rate you experience, and other factors. > > Regards, > Lassi > >
_______________________________________________ Libunwind-devel mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/libunwind-devel
