This is a question about Clojure performance on the JVM.  There might be
similar but different tweaks on the CLR or for ClojureScript, but I'm only
curious about those if someone knows how to achieve the desired performance
improvements today.

I can give more concrete examples if there is interest, but a common
pattern in Clojure code is that objects are allocated, used for a very
short time, then become garbage.  For example, seq'ing over a collection
often does this.

Now I recall Rich Hickey mentioning in one of this talks that modern JVMs
handle this very well, by making the deallocation just a bump of a
pointer.  That might be true, but there is a difference in how even that
can be implemented that can make a significant difference in the
performance.

Case 1: Many objects are being allocated, initialized, used, and then soon
become garbage.  The GC is tuned to wait until 32 Mbytes of this stuff has
been allocated before it bumps its pointer and makes it possible for all of
that memory to be reused for allocation of new objects (32 Mbytes is just
an example I pulled out of the air -- I'm not claiming that is what actual
JVM GCs often do).  This is good in that the memory can very efficiently be
made available again, but note what happened in the mean time: The
processor was initializing the data in its local processor cache, and then
as its local cache becomes full, it writes out this garbage to main memory,
limited by the bandwidth available from the local cache to the main memory,
which is often smaller than the bandwidth available from the processor core
to the local cache, _especially_ if multiple processor cores are doing this
in parallel.

Case 2: Same as case 1, except the GC is somehow tuned to mark memory of
garbage objects as reallocatable after only 512 KB of it has been
allocated, and also so that this 512 KB will be the first to be used for
new allocations afterwards.  This 512 KB of memory keeps getting reused for
allocating new objects, and because it is smaller than the local processor
cache, it _stays_ there, and doesn't get written out to main memory (until
some time later, say after the current inner loop is finished and the
memory access pattern changes significantly).

I believe case 1 is the common case for default JVM GC configurations, at
least the few that I've tried.  I'd like to see if there are already easy
ways to tweak a few command line options and make it work like case 2, and
measure the performance difference.

I know that generational garbage collection makes an explicit distinction
between objects that become garbage "young" vs. those that live for much
longer, but even then it might require tuning of the GC parameters to
switch from case 1 behavior to case 2 behavior.  I believe there has been
research on "cache-aware" garbage collection for many-core processors (e.g.
[1], [2]), but I want to know whether such things exist in deployed JVMs
today (or ones coming soon).

[1] Jin Zhou and Brian Demsky, "Cache-Aware Many-Core Garbage Collection",
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB4QFjAA&url=http%3A%2F%2Fdemsky.eecs.uci.edu%2Fpublications%2Fcacheawaregc.pdf&ei=5cfrTrK6JcHeiAKx1LjEBA&usg=AFQjCNHFdeW-ilfLRTBBDT7BAilSBqFwsw&sig2=Fip4nMsuuZ3-8H2xw6ap3A

[2] http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.175.1600

Thanks,
Andy

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to