Re: Why a Java method invocation is slower when you call it somewhere else in your code?

Gil Tene Sun, 09 Apr 2017 08:49:58 -0700


On Saturday, April 8, 2017 at 9:40:46 AM UTC-7, Kirk Pepperdine wrote:
>
>
> >>> 
> >>> - Your mySleep won't actually do what you think it does. The entire 
> method can be optimized away to nothing after inking at the call site by 
> the JIT once the calls to it actually warm up enough, since it has no side 
> effects and nothing is done with its return code. 
> >> 
> >> Well, this won’t happen in OpenJDK because of the return value. 
> > 
> > The return value "saves" you only as long as the method doesn't get 
> inlined. After it is inlined, the fact that the return value isn't used 
> allows the JIT to kill the entire code… 
>
> You’d think but not in my experience. 
>


Stock OpenJDK currently inlines and completely eliminates:

  public static int wasteSomeTime(int t) {
    int x = 0;
    for(int i = 0; i < t * 10000; i++) {
      x += (t ^ x) % 93;
    }
     return x;
  }

When called like this:

  wasteSomeTime(sleepArg);


So return values demonstrably don't prevent the optimization...


The optimization will not happen if inlining the method at the call site. 


I built a small set of jmh benchmarks to demonstrate this 
<https://github.com/giltene/GilExamples/blob/master/CodeGenExample-benchmarks/src/main/java/bench/MethodInliningExampleBench.java>.
 They result in this:


Benchmark                                             (benchLoopCount)  
(sleepArg)   Mode  Cnt           Score           Error  Units
MethodInliningExampleBench.noRetValIntLoop                      100000          
 1  thrpt    5  2830940580.903 ±  52900090.474  ops/s
MethodInliningExampleBench.noRetValIntLoopNoInlining            100000          
 1  thrpt    5        5500.356 ±       245.758  ops/s
MethodInliningExampleBench.retValIntLoop                        100000          
 1  thrpt    5  2877030926.237 ± 134788500.109  ops/s
MethodInliningExampleBench.retValIntLoopNoInlining              100000          
 1  thrpt    5           0.219 ±         0.007  ops/s



Which demonstrates that when inlining is **prevented** at the caller there 
is a real difference between having return value and not (the loop in the 
method gets optimized away only if there is no return value). But that when 
inlining is not prevented at the caller and the return value is not used, 
both cases get optimized away the same way. 

And since it is "hard" to reliably disallow inlining (without e.g. using 
Aleksey's cool @CompilerControl(CompilerControl.Mode.DONT_INLINE 
annotations in jmh), inlining can bite you and wreck your assumptions at 
any time...

Interestingly, as you can see from the same jmh tests above, while stock 
OpenJDK will optimize away the above code, it *currently* won't optimize 
away this code:

  public static long mySleepL1(long t) {
    long x = 0;
    for(int i = 0; i < t * 10000; i++) {
      x += (t ^ x) % 93;
    }
    return x;
  }

Which differs only in using longs instead of ints.

The results for the longs tests are:

Benchmark                                              (benchLoopCount)  (
sleepArg)   Mode  Cnt           Score           Error  Units
MethodInliningExampleBench.noRetValLongLoop                      100000     
      1  thrpt    5  2924098828.778 ± 234409260.906  ops/s 
MethodInliningExampleBench.noRetValLongLoopNoInlining            100000     
      1  thrpt    5           0.243 ±         0.013  ops/s 
MethodInliningExampleBench.retValLongLoop                        100000     
      1  thrpt    5           0.254 ±         0.014  ops/s 
MethodInliningExampleBench.retValLongLoopNoInlining              100000     
      1  thrpt    5           0.246 ±         0.012  ops/s



So the using longs seems to defeat some of the *current* OpenJDK 
optimizations. But how much would you want to bet on that staying the same 
in the next release? 

Similarly, *current* stock OpenJDK won't recognize that System.nanoTime() 
and System.currentTimeMillis() have no side effects, so the original 
example method:
 
    public static long mySleep(long t) {
        long x = 0;
        for(int i = 0; i < t * 10000; i++) {
            x += System.currentTimeMillis() / System.nanoTime();
        }
        return x;
    }

Will not optimize away at the call site on *current* OpenJDK builds.  But 
this can change at any moment as new optimizations and metadata about 
intrinsics are added in coming versions or with better optimizing JITs.

In all these cases, dead code *might* be removed. And whether or not it 
does can depend on the length of the run, the data you use, the call site, 
the phase of the moon 🌙 , or the version of the JDK or JIT that happens to 
run your code. Any form of comparison (between call sites, versions, etc.) 
with such dead code involved is flakey, and will often lead to "surprising" 
conclusions. Sometimes those surprising conclusions happen right away. 
Sometimes the happen a year later, when you test again using your 
previously established, tried-and-tested, based-on-experience tests that no 
longer do what you think they do...

E.g. I'm fairly sure OpenJDK will at some point (soon?) need to catch up 
optimizations on longs to match optimizations on ints in many cases (who 
uses ints anymore? except for array indexing), which will probably break a 
lot of benchmarks out there that may be inadvertantly relying on longs 
optimizations to not be happening.


-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Why a Java method invocation is slower when you call it somewhere else in your code?

Reply via email to