Re: [Python-Dev] Linus on garbage collection

Glyph Lefkowitz Fri, 06 May 2011 18:41:09 -0700

Apologies in advance for contributing to an obviously and increasingly 
off-topic thread, but this kind of FUD about GC is a pet peeve of mine.

On May 6, 2011, at 10:04 AM, Neal Becker wrote:

> http://gcc.gnu.org/ml/gcc/2002-08/msg00552.html

Counterpoint: <http://lwn.net/Articles/268783/>. Sorry Linus, sometimes
correctness matters more than performance.

But, even the performance argument is kind of bogus. See, for example, this
paper on real-time garbage collection:
<http://domino.research.ibm.com/comm/research_people.nsf/pages/dgrove.ecoop07.html>.
That's just one example of an easy-to-find solution to a problem that Linus
holds up as unsolved or unsolvable. There are solutions to pretty much all of
the problems that Linus brings up. One of these solutions is even famously
implemented by CPython! The CPython "string +=" idiom optimization fixes at
least one case of the "you tend to always copy the node" antipattern Linus
describes, and lots of languages (especially Scheme and derivatives, IIRC) have
very nice optimizations around this area. One could argue that any functional
language without large pools of mutable state (i.e. Erlang) is a massive
optimization for this case.

Another example: the "dirty cache" problem Linus talks about can be addressed
by having a GC that cooperates with the VMM:
<http://www.cs.umass.edu/~emery/pubs/f034-hertz.pdf>.

And the "re-using stuff as fast as possible" thing is exactly the kind of
problem that generational GCs address. When you run out of space in cache, you
reap your first generation before you start copying stuff. One of the key
insights of generational GC is that you'll usually reclaim enough (in this
case, cache-local) memory that you can keep going for a little while. You
don't have to read a super fancy modern paper on this, Wikipedia explains
nicely:
<http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)#Generational_GC_.28ephemeral_GC.29>.
Of course if you don't tune your GC at all for your machine-specific cache
size, you won't see this performance benefit play out.

I don't know if there's a programming language and runtime with a real-time,
VM-cooperating garbage collector that actually exists today which has all the
bells and whistles required to implement an OS kernel, so I wouldn't give the
Linux kernel folks too much of a hard time for still using C; but there's
nothing wrong with the idea in the abstract. The performance differences
between automatic and manual GC are dubious at best, and with a really good GC
and a language that supports it, GC tends to win big. When it loses, it loses
in ways which can be fixed in one area of the code (the GC) rather than
millions of tiny fixes across your whole codebase, as is the case with
strategies used by manual collection algorithms.

The assertion that "modern hardware" is not designed for big data-structure
pointer-chasing is also a bit silly. On the contrary, modern hardware has
evolved staggeringly massive caches, specifically because large programs
(whether they're GC'd or not) tend to do lots of this kind of thing, because
there's a certain level of complexity beyond which one can no longer avoid it.
It's old hardware, with tiny caches (that were, by virtue of their tininess,
closer to the main instruction-processing silicon), that was optimized for the
"carefully stack-allocating everything in the world to conserve cache" approach.

You can see this pretty clearly by running your favorite Python benchmark of
choice on machines which are similar except for cache size. The newer machine,
with the bigger cache, will run Python considerably faster, but doesn't help
the average trivial C benchmark that much - or, for that matter, Linux
benchmarks.

-glyph

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Linus on garbage collection

Reply via email to