On Fri, 25 Feb 2011 18:30:36 +0200, dsimcha <[email protected]> wrote:

== Quote from Vladimir Panteleev ([email protected])'s article
P.S. I'm currently in the process of tracking down a memory corruption
bug, which *might* result in a GC patch for D1. I'm also instinctively
worried that touching the GC code may introduce new memory corruption
bugs, which can be EXTREMELY hard to find. I've been chasing this one for
4 years.

I doubt it's a GC bug. If it's not a bug in your code, I'd be more inclined to assume it's a codegen bug, simply because the codegen is much larger and more
complex, and there are more opportunities for weird bugs that can only be
reproduced under very specific circumstances to creep in. Once you get past the superficial cruftiness and unreadability of the codebase and get a good conceptual
model of it, D's GC is actually pretty simple.

That's what I've been telling myself for the past few years as well. (I've written patches and a memory debugger for D and even attempted writing my own GCs, so I'm no stranger to D's GC.)

Also, I've been testing my patches by using the Phobos,
std.parallelism/parallelfuture, and dstats unittests, and by simply eating my own dogfood (i.e. using my modified GC's when running some simulations and stuff). So far, so good. Unfortunately, we don't have a specific GC test suite, but IMHO if it works on this much real-world code, it's unlikely that I've created any bugs.

How can you be so sure this is enough? The particular manifestation of the bug I was examining crashed my application 5 hours in, because the GC attempted to traverse a free list which had ASCII in it because the item had been allocated but it occured in the free list twice (so the first instance was used by the app to store text), because a freed (GC'd) object was manually deleted again when an element was removed from an associated array, and it was freed initially because the GC never reached it, because its "parent" was marked as NOSCAN, because the GC relies on NOSCAN being cleared on freed objects, and allocating in a destructor called during a GC breaks that assumption (and messes things up generally).

Are you at least running your tests with the GC debug options enabled (such as MEMSTOMP)? I hope your patches don't break them, either.

In case you missed my other reply, what I was aiming at is that something must be done when allocating from destructors. It must either reliably work or immediately fail, and definitely not corrupt the GC's state. Phobos allocates in destructors in a few places as well (std.zlib being one).

--
Best regards,
 Vladimir                            mailto:[email protected]

Reply via email to