I wanted to start a discussion about how various aspects of these new JVM languages are affected by (or how they affect) GC and allocation performance, and what the state-of-the-art answers are for making these languages run well.
Cliff Click's talk at JVMLS made it obvious that even for typical Java or C applications, the limitations of memory bandwidth and cache locality represent the lion's share of performance loss in modern applications. JVMs and compilers can do various tricks to improve this, but at the end of the day you're limited by how much you can fit into cache and how much back-and-forth you need across threads and between the CPU and main memory. Given that knowledge... * Dynamic languages on the JVM have to use boxed numerics most of the time, which means that we're creating a lot of numeric objects. Some of these may be nearly free, immediately-collectable. Some may be eliminated by escape analysis in future versions of the JVM (e.g. current JDK 7, which has EA on by default). But even with the best tricks and best GC, the use of objects for numerics is still going to be slower (on average) than primitives. How to cope with this? * JVM languages that use closures are forced to heap-allocate structures in which to hold closed-over values. That means every instantiation of those closures allocates purely-transient objects, populates them with data, and passes them down-stack for other code bodies to use. In this case, the objects are likely to be longer-lived, though still potentially "youngest" generation. More troublesome, however, is that no current JVMs can inline closures through a megamorphic intermediate call, so we lose almost all inlining-based optimization opportunities like escape analysis or throw/catch reduction to a jump. * Languages like JRuby, which have to maintain "out of band" call stack data (basically a synthetic frame stack for cross-call data) have to either keep a large pre-allocated frame stack in memory or allocate frames for each call. Both have obvious GC/allocation/cache effects. It seems like the current state-of-the-art GC for languages with these issues would be something G1ish, where both "newborn" and "youthful" objects can be collected en masse. As far as allocation goes, EA is the only solution that seems likely to help, but it depends on being able to inline...something that's hard to do for many calls in dynamic languages but also for non-closure-converted calls in any of these languages. Another wrinkle is the use of immutable structures, as in Clojure. I'm curious whether such systems generate more garbage than those that permit the use of in-place-mutable structures (seems to me that they would) and how that plays into memory bandwidth, allocation and GC rates, and whether the bulk of the extra garbage is young or gets tenured in typical usage. We have been trying to explore the next steps for JRuby performance, and we have begun to suspect that we're hitting memory/alloc/GC bottlenecks for many (most?) cases. This is obviously harder to investigate than straight-up execution performance, since even looking at the Hotspot assembly output doesn't always make it clear what memory effects a piece of code will have. So what have you all been seeing, and what tools are you using to investigate the memory/alloc/GC impact of your languages? - Charlie --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "JVM Languages" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en -~----------~----~----~----~------~----~------~--~---
