+---------- On Oct 25, Andrew Piskorski said:
> What could possibly be causing it to segfault in _smalloc, of all
> places?

Heap corruption. Segfaults in malloc or free are virtually always due to
heap corruption. Probably something is overflowing a buffer, or freeing
something twice, or writing to freed memory.

> I normally use gcc 2.95.2, but I also tried Sun cc (aka, Sun WorkShop
> 6 update 2 C 5.3 2001/05/15) - seems to make no difference.  Purify
> (with both gcc and Sun cc) reports various sorts of "errors", none of
> which I can do anything about (in the vendor library), and I'm not at
> all convinced they have anything to do with this problem anyway.

Sounds suspicious.  If they're just UMRs, they're probably not the
culprit.  If they are writes to unallocated/freed memory, they are
definitely problems and probably related to your segfaults.

Here's one method I used with occasional success when debugging this
sort of problem (back at ArsDigita, before I got Purify working): Dump
the instruction that got the fault, which you can do with "x/1i $pc".
>From that you can figure out what register malloc is dereferencing.
Look at the contents of that register. Suppose the register contains
something like 0x6E647265. It just so happens that all of those bytes
are ASCII characters; they spell "ndre". So if you have some idea of
what code handles the string "ndre", you have an idea of where to look.
Usually you won't be so lucky and the bogus pointer won't just be ASCII
characters. Then you can try to examine the code leading up to faulting
instruction to try to figure out where the bogus pointer got loaded
from, and look in that part of memory for clues - ASCII strings, and
things that look like structs used by your C code.

It's a rather difficult process, and probably not necessary. You have
Purify. It's pretty darn good about these things. If you're running nsd
with the -z flag, turn it off while debugging this, because it might
make Purify work even better. You say that Purify is already reporting
errors - you should get those fixed. If you can't get your vendor to fix
his library, then either replace that library or isolate it in another
process and talk to it over a socket or pipes.

Reply via email to