== Quote from nobody ([email protected])'s article > Hi, > I'm writing a data processing program in D, which deals with large amounts of > small objects. One of the thing I found is that D's GC is horribly slow in > such situation. I tried my program with gc enable & disabled (with some manual > deletes). The GC disabled version (2 min) is ~100 times faster than the GC > enabled version (4 hours)! > But of course the GC disabled version still leak memory, it soon exceeds the > machine memory limit when I try to process more data; while the GC enabled > version don't have such problem. > So my plan is to use the GC disabled version with manual deletes. But it was > very hard to find all the memory leaks. I'm wondering: is there anyway to use > GC as a leak detector? can the GC enabled version give me some help > information on which objects get collected, so I can manually delete them in > my GC disabled version? Thanks!
I've dealt with a bunch of somewhat similar situations in code I've written, here are some tips that others have not already mentioned, and that might be less drastic than going with fully manual memory management: One thing you could try is disabling the GC (this really just disables automatic running of the collector) and run it manually at points that you know make sense. For example, you could just insert a GC.collect() statement at the end of every run of your main loop. Another thing to try is avoiding appending to arrays. If you know the length in advance, you can get pretty good speedups by pre-allocating the array instead of appending using the ~= operator. You can safely delete specific objects manually even when the GC is enabled. For very large objects with trivial lifetimes, this is probably worth doing. First of all, the GC will run less frequently. Secondly, D's GC is partially conservative, meaning that occasionally memory will not be freed when it should be. The probability of this happening is proportional to the size of the memory block. Lastly, I've been working on a generic second stack/mark-release allocator for D2, called TempAlloc. It's useful for when you need to temporarily allocate memory in a last in, first out order, but you can't use the call stack for whatever reason. I've also implemented a few basic data structures (hash tables and hash sets) that are specifically designed for this allocator. Right now, it's coevolving with my dstats statistics lib, but if you want to try it or at least look at it and give me some feedback, I'd like to eventually get it to the point where it can be added to Phobos and/or Tango. See http://svn.dsource.org/projects/dstats/docs/alloc.html .
