On Sat, 26 Jul 2008 17:36:35 -0700, Doug Hardie <[EMAIL PROTECTED]> wrote:
> On Jul 26, 2008, at 17:10, Kris Kennaway wrote:
>> Firstly, if you did not recompile the program under 7.0 then it is not
>> using the new malloc at all.
> It was recompiled.  All there is on the system is new stuff.  It was
> built from scratch when 7.0 came out.
>> If you did recompile it and it is behaving differently then it is
>> probably because your program contains bugs in how it manages memory
>> that happened to be working by accident with the old memory
>> allocator. e.g. because you were making use of memory after it had
>> been freed, but before the allocator returned it to some other
>> malloc() call.
> That is certainly possible.  However, the program has worked under
> considerable load for many years with versions 3.7 to 6.2.  Problems
> only occur with 7.0.  The program is quite complex and big.  It uses
> probably hundreds of mallocs in a typical use.  The problems only
> occur reasonably randomly and only under quite heavy load.  The
> developer is looking into it, but the problem only occurs on FreeBSD
> 7.0, not any other Unix systems.  In the meantime I am losing money
> because of it.

While that's understandable, the current malloc() has undergone quite
extensive testing by Jason Evans and a lot of people who use it in
FreeBSD 7.X or later.  Its ability to expose bugs in this way was deemed
important enough that it is now used by other projects too.

What Kris wrote in:

    Finally, there is no way to revert to the "old approach"
    because the new allocator is completely new; it allocates
    memory based on its own strategy.  None of the malloc options
    affect the behaviour of correct programs (but some of them
    can help to improve performance, or to debug incorrect

is a bit important.  Even if you tweak enough options the new malloc()
may *not* work similarly enough for the program to keep working.  If you
are lsing money _right_ _now_ because of problems in the program, it may
be worth going back to 6-STABLE and the old malloc() until the bugs of
the program have been fixed by the developers.

> Not surprising but I seem to recall that when it was first introduced
> into stable that there was some discussion on how to make it look more
> like the old malloc.  I couldn't find that via a search though.

If all else fails, you can try forward-porting phkmalloc to 7.X but it's
not necessarily easier than going temporarily back to 6.X and fixing the
program to work correctly on 7.X.

It basically all boils down to ``How much time do you want to spend with
a possibly crashing service?''

There's definitely a bug somewhere and you ultimately need it resolved.
It is highly unlikely that it is in malloc() itself, but you can
probably use its debugging features to help you find out if it is a bug
in malloc() (see the preprocessor define MALLOC_PRODUCTION in
libc/stdlib/malloc.c), or if it a bug in the program using malloc() and
_where_ it may be.

The new malloc() also includes an option that can dump 'utrace' debug
output of all the malloc(), calloc(), realloc(), posix_memalign() and
free() calls of malloc.c.  If you haven't tried it already, it may be
another useful tool to help you track down where the bug is.

Tracing a program's malloc usage with the 'U' option is relatively easy
to do if you spawn just *this* program with MALLOC_OPTIONS='U':

    # ktrace env MALLOC_OPTIONS='U' your-program-here

Then you can dump the 'utrace' entries logged by ktrace, with:

    # kdump [optionally, more kdump options] -f ktrace.out

You should see something like this:

    $ kdump -T -t u -f ktrace.out | head -40
     26674 ls       1217123351.156040 USER  malloc_init()
     26674 ls       1217123351.156369 USER  0x8101000 = malloc(4096)
     26674 ls       1217123351.156515 USER  0x8102000 = malloc(2560)
     26674 ls       1217123351.156611 USER  0x8103800 = malloc(2048)
     26674 ls       1217123351.156702 USER  0x810b020 = malloc(20)
     26674 ls       1217123351.156881 USER  free(0x8101000)
     26674 ls       1217123351.157074 USER  0x8101000 = malloc(3191)
     26674 ls       1217123351.157191 USER  0x810c000 = malloc(4096)
     26674 ls       1217123351.157369 USER  0x810d000 = malloc(3219)
     26674 ls       1217123351.157431 USER  free(0x8101000)
     26674 ls       1217123351.157538 USER  free(0x810c000)
     26674 ls       1217123351.157743 USER  0x810e400 = malloc(524)
     26674 ls       1217123351.157865 USER  0x8104000 = malloc(1280)
     26674 ls       1217123351.157922 USER  0x8101040 = malloc(89)
     26674 ls       1217123351.157975 USER  0x81010a0 = malloc(90)
     26674 ls       1217123351.158065 USER  0x8101100 = malloc(89)
     26674 ls       1217123351.158170 USER  free(0x8101100)

If your bug is a double-free bug, then a bit of post-processing of this
will quickly reveal if there *is* a double free bug when a duplicate
free() call is found.  Then you can dump more ktrace records, in an
effort to pinpoint the exact place where the original allocation
happens, and you can keep going from there.

If you see data changing 'under your feet' it's quite likely that you
are trying to use data after it has been freed.  A nice option that you
can _enable_ to catch that in action is 'J'.  By dumping the unexpected
data and using the info from malloc.conf(5)'s description of 'J' you may
find useful bits of information to track the bug down:

     J   Each byte of new memory allocated by malloc(), realloc()
         or reallocf() will be initialized to 0xa5.  All memory
         returned by free(), realloc() or reallocf() will be
         initialized to 0x5a.  This is intended for debugging and
         will impact performance negatively.

freebsd-questions@freebsd.org mailing list
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to