Marcin Mielżyński wrote:
Hi,

It seems yesterdays suspicions confirmed and there is a HUGE bottleneck in hotspot and switch statements. Simple method (with a switch) natively compiles until there are more than 11 switch cases (is always run with 1 and an argument):

Here are two responses from folks at Sun...I can relay any more results back to them.

John Rose:

If it's a very complex method, then a global optimizer like C2's may have problems that a template-based compiler will not. It will bail out and interpret very large methods, rather than risking an OOM during compilation. The upcoming tiered compilation system will back off to C1 instead of the interpreter.

Try running with -XX:+PrintCompilation to see what's going on.  (See below.)

To get endless reams of information, use -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation .

I'd be happy to look at the output. I'd appreciate both output from both JDK7 and a previous version, so I can compare.

The best fix might be to hand the optimizer smaller methods.
Consider handling the most frequent bytecodes inline, and the rest out-of-line.

Tom Rodriquez:

One obvious problem with the micro benchmark is that if testBench is small enough to be inlined into it's caller then it's possible to prove that i == 1 which makes all the work constant fold away. If you use "static int one = 1" to produce one instead you can see that it doesn't fold away anymore and the performance of the different switch sizes are similar. I get around 140 for the 64 element switch or 70 for the 11 element switch. If I disable inlining I get 140 and 100 so inlining your test case into the harness part is distorting the numbers, probably because n is between 1 and 10 so the call overhead is significant. How representative of is this test case of what's actually occurring?

C2 either generates a real jump table which has a fixed cost if there are more than 18 items in a switch or it generates a binary decision tree of ifs. C1 does a similar decision tree though I think it's generated slightly differently. If you are using a switch you are implying that all the cases are of a similar frequency. If that's not true or if you know something about the frequency you might be better off emitting ifs for the common casees. I don't know why there would be a jdk6 vs. jdk7 difference though.

One issue with C2 is that even though we collect information about the frequency of branches in switch statements we don't optimize based on it. It probably wouldn't be hard to pick the most frequent parts of the switch and guard them separately if the the distribution is very uneven.


---------------------------------------------------------------------
To unsubscribe from this list please visit:

   http://xircles.codehaus.org/manage_email

Reply via email to