Re: electricfence/malloc() query

Marco Peereboom Tue, 14 Mar 2006 07:21:39 -0800

I am not an expert at all finding these bugs but I can tell you of ananecdote in some code that I maintain.

This is a large application that would fail seemingly random under afair amount of stress. The odd thing is that this app, which runs onmultiple OS' seemed to work on RHEL3, AS 2.1, Windows etc but on RHEL 4it pooped itself. Since at first this bug was hard to reproduce it hadbeen ignored mostly until someone found a reliable way of reproducingit. Weeks go buy trying to find this bug, which had all the symptoms ofeither a memory overflow or a synchronization issue. Eventually I foundit while disassembling a 3rd party library. Here is what theseEinsteins had done:


#include <stdio.h>
#include <stdlib.h>

struct moo {
        u_int32_t       a, b, c, d;
        char            e[44];
};

int
main(int argc, char *argv[])
{
        struct moo      m[2];
        struct moo      *p;

        p = m;

        p->a = 0;
        p->b = 1;

        /* go to next element */
        p = p + sizeof(struct moo);

        p->a = 0;
        p->b = 1;

        return (0);
}

So any tool that I threw at this died mysteriously with the samefailure. Valgrind was particularly funny because since it pretends tobe an OS it pooped itself with the exact failure as the program would.The value add was that it took longer for the code to fail because itslowed down the app by like 1000%.

Eventually I found this gem after I wrote a custom efence like app thatput guard pages, marked PROT_NONE, in front and back of a chunk ofallocated memory. I was simply lucky that it hit just right and it gaveme proximity to where the code was failing. Interestingly enough thiscode also had 32-bit canaries (among an array of other features) infront and back of the allocated memory chunk however those were neveroverwritten.

The moral is that I used several tools, including expensive commercialones, to track down this bug. It was only after a custom rig that I wasable to get a proximity reading. Do not rely on tools to say: "itpassed because tool x didn't complain". Bugs can be very subtle, orlike this one, symptoms can be very subtle and hard to track down.There is no swiss army knife when it comes to memory bugs.

After this was fixed all kinds of other mysterious bugs that seemedunrelated disappeared; including on all other OS'. Magic :-)


FWIW,
/marco

Edd Barrett wrote:

Hello people,

I wish to query the usefullness (if thats not a made up word) of
electricfence on OpenBSD. I have a program which works great when not linked
against -lefence, but gives a bus error otherwise (not as a result of my
code, but in libpq  according to a stack trace :O ).

A google search later, and I find this page (http://kerneltrap.org/node/5584)
in which theo explains that the new malloc() does exactly what electric
fence does by default. So my question are:

a) Why do we have a port of electric fence?

b) If my program runs fine on OpenBSD without -lefence can I assume that no
buffers have been over-run?

c) (off-topic) How are people checking for memory leaks these days on
OpenBSD? I took a quick look at gc-boehm, but havent got it working as of
yet. How well does it work for you and what alternatives exist? It seems
most are using valgrind, but thats very linuxcentric if i understand
correctly.

Thanks for you time guys

Best Regards

Edd

Re: electricfence/malloc() query

Reply via email to