Marcin Mielżyński wrote:
Hi,
It seems yesterdays suspicions confirmed and there is a HUGE bottleneck
in hotspot and switch statements.
Simple method (with a switch) natively compiles until there are more
than 11 switch cases (is always run with 1 and an argument):
Here are two responses from folks at Sun...I can relay any more results
back to them.
John Rose:
If it's a very complex method, then a global optimizer like C2's may
have problems that a template-based compiler will not.
It will bail out and interpret very large methods, rather than risking
an OOM during compilation.
The upcoming tiered compilation system will back off to C1 instead of
the interpreter.
Try running with -XX:+PrintCompilation to see what's going on. (See below.)
To get endless reams of information, use -XX:+UnlockDiagnosticVMOptions
-XX:+LogCompilation .
I'd be happy to look at the output. I'd appreciate both output from
both JDK7 and a previous version, so I can compare.
The best fix might be to hand the optimizer smaller methods.
Consider handling the most frequent bytecodes inline, and the rest
out-of-line.
Tom Rodriquez:
One obvious problem with the micro benchmark is that if testBench is
small enough to be inlined into it's caller then it's possible to prove
that i == 1 which makes all the work constant fold away. If you use
"static int one = 1" to produce one instead you can see that it doesn't
fold away anymore and the performance of the different switch sizes are
similar. I get around 140 for the 64 element switch or 70 for the 11
element switch. If I disable inlining I get 140 and 100 so inlining
your test case into the harness part is distorting the numbers, probably
because n is between 1 and 10 so the call overhead is significant. How
representative of is this test case of what's actually occurring?
C2 either generates a real jump table which has a fixed cost if there
are more than 18 items in a switch or it generates a binary decision
tree of ifs. C1 does a similar decision tree though I think it's
generated slightly differently. If you are using a switch you are
implying that all the cases are of a similar frequency. If that's not
true or if you know something about the frequency you might be better
off emitting ifs for the common casees. I don't know why there would be
a jdk6 vs. jdk7 difference though.
One issue with C2 is that even though we collect information about the
frequency of branches in switch statements we don't optimize based on
it. It probably wouldn't be hard to pick the most frequent parts of the
switch and guard them separately if the the distribution is very uneven.
---------------------------------------------------------------------
To unsubscribe from this list please visit:
http://xircles.codehaus.org/manage_email