On Jul 26, 2008, at 19:03, Giorgos Keramidas wrote:

While that's understandable, the current malloc() has undergone quite
extensive testing by Jason Evans and a lot of people who use it in
FreeBSD 7.X or later. Its ability to expose bugs in this way was deemed
important enough that it is now used by other projects too.

while in general I like the new approach, this problem has been a killer. I did find a number of errors in my own code where I was not allocating enough space for some things. Those showed up instantly with 7.0 and were easy to fix.



What Kris wrote in:

   Finally, there is no way to revert to the "old approach"
   because the new allocator is completely new; it allocates
   memory based on its own strategy.  None of the malloc options
   affect the behaviour of correct programs (but some of them
   can help to improve performance, or to debug incorrect
   programs).

is a bit important.  Even if you tweak enough options the new malloc()
may *not* work similarly enough for the program to keep working. If you are lsing money _right_ _now_ because of problems in the program, it may
be worth going back to 6-STABLE and the old malloc() until the bugs of
the program have been fixed by the developers.

Unfortunately that is not possible. We upgraded the hardware and some of the components were not supported very well under 6.x. Despite several weeks of testing of the new hardware and 7.0, the problem did not arise till several weeks after going into production. It takes about a week of real time before the problem tends to become visible. By compressing the workload I have been able to setup a test machine such that it takes 2-4 days before it occurs.



Not surprising but I seem to recall that when it was first introduced
into stable that there was some discussion on how to make it look more
like the old malloc.  I couldn't find that via a search though.

If all else fails, you can try forward-porting phkmalloc to 7.X but it's not necessarily easier than going temporarily back to 6.X and fixing the
program to work correctly on 7.X.

It basically all boils down to ``How much time do you want to spend with
a possibly crashing service?''

There's definitely a bug somewhere and you ultimately need it resolved.
It is highly unlikely that it is in malloc() itself, but you can
probably use its debugging features to help you find out if it is a bug
in malloc() (see the preprocessor define MALLOC_PRODUCTION in
libc/stdlib/malloc.c), or if it a bug in the program using malloc() and
_where_ it may be.

The new malloc() also includes an option that can dump 'utrace' debug
output of all the malloc(), calloc(), realloc(), posix_memalign() and
free() calls of malloc.c.  If you haven't tried it already, it may be
another useful tool to help you track down where the bug is.

Tracing a program's malloc usage with the 'U' option is relatively easy
to do if you spawn just *this* program with MALLOC_OPTIONS='U':

   # ktrace env MALLOC_OPTIONS='U' your-program-here

Then you can dump the 'utrace' entries logged by ktrace, with:

   # kdump [optionally, more kdump options] -f ktrace.out

You should see something like this:

   $ kdump -T -t u -f ktrace.out | head -40
    26674 ls       1217123351.156040 USER  malloc_init()
    26674 ls       1217123351.156369 USER  0x8101000 = malloc(4096)
    26674 ls       1217123351.156515 USER  0x8102000 = malloc(2560)
    26674 ls       1217123351.156611 USER  0x8103800 = malloc(2048)
    26674 ls       1217123351.156702 USER  0x810b020 = malloc(20)
    26674 ls       1217123351.156881 USER  free(0x8101000)
    26674 ls       1217123351.157074 USER  0x8101000 = malloc(3191)
    26674 ls       1217123351.157191 USER  0x810c000 = malloc(4096)
    26674 ls       1217123351.157369 USER  0x810d000 = malloc(3219)
    26674 ls       1217123351.157431 USER  free(0x8101000)
    26674 ls       1217123351.157538 USER  free(0x810c000)
    26674 ls       1217123351.157743 USER  0x810e400 = malloc(524)
    26674 ls       1217123351.157865 USER  0x8104000 = malloc(1280)
    26674 ls       1217123351.157922 USER  0x8101040 = malloc(89)
    26674 ls       1217123351.157975 USER  0x81010a0 = malloc(90)
    26674 ls       1217123351.158065 USER  0x8101100 = malloc(89)
    26674 ls       1217123351.158170 USER  free(0x8101100)
    [...]

If your bug is a double-free bug, then a bit of post-processing of this
will quickly reveal if there *is* a double free bug when a duplicate
free() call is found.  Then you can dump more ktrace records, in an
effort to pinpoint the exact place where the original allocation
happens, and you can keep going from there.

If you see data changing 'under your feet' it's quite likely that you
are trying to use data after it has been freed. A nice option that you can _enable_ to catch that in action is 'J'. By dumping the unexpected data and using the info from malloc.conf(5)'s description of 'J' you may
find useful bits of information to track the bug down:

    J   Each byte of new memory allocated by malloc(), realloc()
        or reallocf() will be initialized to 0xa5.  All memory
        returned by free(), realloc() or reallocf() will be
        initialized to 0x5a.  This is intended for debugging and
        will impact performance negatively.



_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to