On Tue, Nov 17, 2009 at 6:48 AM, Rich Hickey <[email protected]> wrote:
> I think it is essential to separate the *two* problems with boxed
> numbers. One problem is arithmetic with boxed numbers. EA, inlining,
> lever compilers etc can certainly do something about this. Frederik
> Öhrström gave an impressive demonstration at the JVM language summit
> showing how even large structures used as numbers could provide
> sufficient information to the compiler to enable their complete
> removal in arithmetic.

And I have seen, for very limited cases, that some boxed numerics are
eliminated in the final assembly. But not enough of them to make a
difference when *all* numerics are boxed values, as in JRuby or
Clojure :(

Because of this I've resigned myself that without JVM help we're not
going to be able to compete with other Ruby implementations on raw
numeric performance, since all the C/C++ based Ruby impls use real
fixnums. And that makes me sad.

FWIW, we can come within 2x the performance of the fastest native
impl's numeric perf (on simple things like fib) so the JVM is
certainly doing a cracker-jack job making boxed numerics as fast as
possible. It's just not fast enough.

> The second, significant, and currently unavoidable problem with
> objects as numbers is that they do, in fact, escape. All the time. In
> order to be stored in our data structures. Obtaining their value
> requires a pointer deref, and when incorporated into the long-term
> information maintained by an application, they move out of the young
> gen. They might increase the number of active older objects by an
> order of magnitude or more vs a comparable Java program using
> primitives as fields and in collections. It is this latter problem
> that is bigger, and that fixnums address in a way I think nothing else
> can. Until references can be made to hold both an object reference on
> the heap and a number not on the heap, we are doomed to have to choose
> one. And choosing objects yields a terrible memory profile, in pointer
> nav, cache misses and GC.

This is one I have worried about. With all those boxed numbers
floating around, we eventually end up tenuring *numbers*, for pete's
sake, consuming older generations' space and increasing GC costs for
those generations. As in Clojure, we have a cache for a small range of
fixnums, but it's not good enough for anything but the simplest
iterations and algorithms. The value of caching larger fixnums rapidly
diminishes, and also can affect startup time since we need to
pre-populate that cache on startup. I agree there's no good answer
here, right now.

Maybe we should start a pool to hire someone to implement fixnums.

> I think you'll find the actual time cost of overflow checking is
> dwarfed by other things. CPU parallelism seems to do a decent job with
> overflow checks.

Assuming the overflow checks that eventually make it to asm are being
done in the most efficient way possible. I'll try to dig up the actual
x86 instructions later and we can discuss whether it's as quick as it
ought to be.

> Immutable data structures have the very nice GC property that old
> things are never pointing to younger things. In addition, never being
> updated means they need not be volatile or have their access wrapped
> in locks. Yes, there is ephemeral garbage, and making that cheaper is
> always good. As Attila mentions in another message, being able to
> communicate about immutability to the runtime, and have it leverage
> that information, would be a huge win.

I think it would be helpful even for mutable systems, since even then
we have a lot of immutable state and no way to tell the JVM about it.

It seems a lot of this comes down to not having (or not knowing) the
right tools to investigate hidden GC/memory/alloc/cache effects
resulting from the way our languages are put together and the way the
JVM compiles them. My tools of choice have been PrintGCDetails for
rough information, PrintOptoAssembly to see what asm is actually
coming out, and now starting to get into the ideal graph data and
visualizer to understand the optimization decisions Hotspot is making.
But none of those go very far toward investigating the impact of
irreducible allocations of heap scopes and numerics, but the
less-than-100% CPU utilization for most numeric algorithms in JRuby
tells me there's a lot of room to improve.

- Charlie

--

You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=.


Reply via email to