On Jul 26, 2008, at 19:03, Giorgos Keramidas wrote:
While that's understandable, the current malloc() has undergone quite
extensive testing by Jason Evans and a lot of people who use it in
FreeBSD 7.X or later. Its ability to expose bugs in this way was
deemed
important enough that it is now used by other projects too.
while in general I like the new approach, this problem has been a
killer. I did find a number of errors in my own code where I was not
allocating enough space for some things. Those showed up instantly
with 7.0 and were easy to fix.
What Kris wrote in:
Finally, there is no way to revert to the "old approach"
because the new allocator is completely new; it allocates
memory based on its own strategy. None of the malloc options
affect the behaviour of correct programs (but some of them
can help to improve performance, or to debug incorrect
programs).
is a bit important. Even if you tweak enough options the new malloc()
may *not* work similarly enough for the program to keep working. If
you
are lsing money _right_ _now_ because of problems in the program, it
may
be worth going back to 6-STABLE and the old malloc() until the bugs of
the program have been fixed by the developers.
Unfortunately that is not possible. We upgraded the hardware and some
of the components were not supported very well under 6.x. Despite
several weeks of testing of the new hardware and 7.0, the problem did
not arise till several weeks after going into production. It takes
about a week of real time before the problem tends to become visible.
By compressing the workload I have been able to setup a test machine
such that it takes 2-4 days before it occurs.
Not surprising but I seem to recall that when it was first introduced
into stable that there was some discussion on how to make it look
more
like the old malloc. I couldn't find that via a search though.
If all else fails, you can try forward-porting phkmalloc to 7.X but
it's
not necessarily easier than going temporarily back to 6.X and fixing
the
program to work correctly on 7.X.
It basically all boils down to ``How much time do you want to spend
with
a possibly crashing service?''
There's definitely a bug somewhere and you ultimately need it
resolved.
It is highly unlikely that it is in malloc() itself, but you can
probably use its debugging features to help you find out if it is a
bug
in malloc() (see the preprocessor define MALLOC_PRODUCTION in
libc/stdlib/malloc.c), or if it a bug in the program using malloc()
and
_where_ it may be.
The new malloc() also includes an option that can dump 'utrace' debug
output of all the malloc(), calloc(), realloc(), posix_memalign() and
free() calls of malloc.c. If you haven't tried it already, it may be
another useful tool to help you track down where the bug is.
Tracing a program's malloc usage with the 'U' option is relatively
easy
to do if you spawn just *this* program with MALLOC_OPTIONS='U':
# ktrace env MALLOC_OPTIONS='U' your-program-here
Then you can dump the 'utrace' entries logged by ktrace, with:
# kdump [optionally, more kdump options] -f ktrace.out
You should see something like this:
$ kdump -T -t u -f ktrace.out | head -40
26674 ls 1217123351.156040 USER malloc_init()
26674 ls 1217123351.156369 USER 0x8101000 = malloc(4096)
26674 ls 1217123351.156515 USER 0x8102000 = malloc(2560)
26674 ls 1217123351.156611 USER 0x8103800 = malloc(2048)
26674 ls 1217123351.156702 USER 0x810b020 = malloc(20)
26674 ls 1217123351.156881 USER free(0x8101000)
26674 ls 1217123351.157074 USER 0x8101000 = malloc(3191)
26674 ls 1217123351.157191 USER 0x810c000 = malloc(4096)
26674 ls 1217123351.157369 USER 0x810d000 = malloc(3219)
26674 ls 1217123351.157431 USER free(0x8101000)
26674 ls 1217123351.157538 USER free(0x810c000)
26674 ls 1217123351.157743 USER 0x810e400 = malloc(524)
26674 ls 1217123351.157865 USER 0x8104000 = malloc(1280)
26674 ls 1217123351.157922 USER 0x8101040 = malloc(89)
26674 ls 1217123351.157975 USER 0x81010a0 = malloc(90)
26674 ls 1217123351.158065 USER 0x8101100 = malloc(89)
26674 ls 1217123351.158170 USER free(0x8101100)
[...]
If your bug is a double-free bug, then a bit of post-processing of
this
will quickly reveal if there *is* a double free bug when a duplicate
free() call is found. Then you can dump more ktrace records, in an
effort to pinpoint the exact place where the original allocation
happens, and you can keep going from there.
If you see data changing 'under your feet' it's quite likely that you
are trying to use data after it has been freed. A nice option that
you
can _enable_ to catch that in action is 'J'. By dumping the
unexpected
data and using the info from malloc.conf(5)'s description of 'J' you
may
find useful bits of information to track the bug down:
J Each byte of new memory allocated by malloc(), realloc()
or reallocf() will be initialized to 0xa5. All memory
returned by free(), realloc() or reallocf() will be
initialized to 0x5a. This is intended for debugging and
will impact performance negatively.
_______________________________________________
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"