On Fri, 25 Feb 2011 18:30:36 +0200, dsimcha <[email protected]> wrote:
== Quote from Vladimir Panteleev ([email protected])'s article
P.S. I'm currently in the process of tracking down a memory corruption
bug, which *might* result in a GC patch for D1. I'm also instinctively
worried that touching the GC code may introduce new memory corruption
bugs, which can be EXTREMELY hard to find. I've been chasing this one
for
4 years.
I doubt it's a GC bug. If it's not a bug in your code, I'd be more
inclined to
assume it's a codegen bug, simply because the codegen is much larger and
more
complex, and there are more opportunities for weird bugs that can only be
reproduced under very specific circumstances to creep in. Once you get
past the
superficial cruftiness and unreadability of the codebase and get a good
conceptual
model of it, D's GC is actually pretty simple.
That's what I've been telling myself for the past few years as well. (I've
written patches and a memory debugger for D and even attempted writing my
own GCs, so I'm no stranger to D's GC.)
Also, I've been testing my patches by using the Phobos,
std.parallelism/parallelfuture, and dstats unittests, and by simply
eating my own
dogfood (i.e. using my modified GC's when running some simulations and
stuff). So
far, so good. Unfortunately, we don't have a specific GC test suite,
but IMHO if
it works on this much real-world code, it's unlikely that I've created
any bugs.
How can you be so sure this is enough? The particular manifestation of the
bug I was examining crashed my application 5 hours in, because the GC
attempted to traverse a free list which had ASCII in it because the item
had been allocated but it occured in the free list twice (so the first
instance was used by the app to store text), because a freed (GC'd) object
was manually deleted again when an element was removed from an associated
array, and it was freed initially because the GC never reached it, because
its "parent" was marked as NOSCAN, because the GC relies on NOSCAN being
cleared on freed objects, and allocating in a destructor called during a
GC breaks that assumption (and messes things up generally).
Are you at least running your tests with the GC debug options enabled
(such as MEMSTOMP)? I hope your patches don't break them, either.
In case you missed my other reply, what I was aiming at is that something
must be done when allocating from destructors. It must either reliably
work or immediately fail, and definitely not corrupt the GC's state.
Phobos allocates in destructors in a few places as well (std.zlib being
one).
--
Best regards,
Vladimir mailto:[email protected]