+---------- On Oct 25, Andrew Piskorski said: > What could possibly be causing it to segfault in _smalloc, of all > places?
Heap corruption. Segfaults in malloc or free are virtually always due to heap corruption. Probably something is overflowing a buffer, or freeing something twice, or writing to freed memory. > I normally use gcc 2.95.2, but I also tried Sun cc (aka, Sun WorkShop > 6 update 2 C 5.3 2001/05/15) - seems to make no difference. Purify > (with both gcc and Sun cc) reports various sorts of "errors", none of > which I can do anything about (in the vendor library), and I'm not at > all convinced they have anything to do with this problem anyway. Sounds suspicious. If they're just UMRs, they're probably not the culprit. If they are writes to unallocated/freed memory, they are definitely problems and probably related to your segfaults. Here's one method I used with occasional success when debugging this sort of problem (back at ArsDigita, before I got Purify working): Dump the instruction that got the fault, which you can do with "x/1i $pc". >From that you can figure out what register malloc is dereferencing. Look at the contents of that register. Suppose the register contains something like 0x6E647265. It just so happens that all of those bytes are ASCII characters; they spell "ndre". So if you have some idea of what code handles the string "ndre", you have an idea of where to look. Usually you won't be so lucky and the bogus pointer won't just be ASCII characters. Then you can try to examine the code leading up to faulting instruction to try to figure out where the bogus pointer got loaded from, and look in that part of memory for clues - ASCII strings, and things that look like structs used by your C code. It's a rather difficult process, and probably not necessary. You have Purify. It's pretty darn good about these things. If you're running nsd with the -z flag, turn it off while debugging this, because it might make Purify work even better. You say that Purify is already reporting errors - you should get those fixed. If you can't get your vendor to fix his library, then either replace that library or isolate it in another process and talk to it over a socket or pipes.
