Looks like you run long computation inside the static initializer <clinit>, 
right?
The problem is that HotSpot cannot fully optimize static calls when the 
holder class is not completely initialized. Moving computation to a 
separate class is the right way to deal with such cases.

The situation got even worse in recent JDK updates. There was a zero day 
bug https://bugs.openjdk.java.net/browse/JDK-8215634 that could result in 
invocation of a static method of uninitialized class in violation of JVMS. 
The bug was fixed in JDK 8u201 and JDK 11.0.2. However, the fix is terrible 
from performance perspective: now the invocation of a static method of 
uninitialized class goes through the slow path, in particular, through 
resolve_static_call() JVM runtime function.

So, the recommendation is simple: move heavy logic out of a class being 
initialized.

- Andrei 

четверг, 7 февраля 2019 г., 1:54:29 UTC+3 пользователь Shevek написал:
>
> Hmm... "Justify/explain your beliefs." Fair, I'm game: 
>
> I think I'm sure that my methods are both JIT'd because perf-java-flames 
> shows them both in green, and I think that's derived from some prefix on 
> the symbol name where the JVM (via perf) says it's JIT'd. If it weren't 
> JIT'd, I'd see it listed in red as "Interpreter". 
>
> I waited for the system to warm up before grabbing the flame-chart, and 
> on one run, I was too early, so I actually saw one symbol shift from 
> interpreted to JIT. I have not used jitwatch, but the signs I am looking 
> at seem reliable. The interpreter JIT'd the inner method first, then the 
> outer one, as expected based on call frequency. 
>
> I've never seen this resolve_static_call method show up before in any of 
> my profiling of Java code. 
>
>  From that PoV, it looks as if the interpreter is doing something odd 
> with method call between the two methods, despite that it should just be 
> a static linkage. Like it JIT'd both methods, but didn't JIT the call, 
> or it's using some interpreter-path code in the call? So is there a 
> linkage limit based on #methods in the class, or something? Because 
> making the methods non-static ALSO moved them into a new class which has 
> only 2 methods... ok, new experiment... 
>
> For what it's worth, I seem to be really struggling with JDK 1.8.1b191 
> performance on the new Xeon hardware for other reasons, too. I'm seeing 
> perf saying pthread_cond_wait -> native_write_msr is taking 50% of 
> runtime, and not even sure where to start with that except limit the 
> life of any JVM to 6 hours and restart it. I kind of want to blame a 
> Kernel / PMU change but it only affects the JVM. 
>
> Caveat: I don't do JVM internals, I'm mostly a JLS-layer muggle. 
>
> S. 
>
> On 2/4/19 10:53 PM, Todd Lipcon wrote: 
> > On Mon, Feb 4, 2019 at 9:13 PM Shevek <[email protected] <javascript:> 
> > <mailto:[email protected] <javascript:>>> wrote: 
> > 
> >     This isn't a JIT issue. According to perf-java-flames, all my code 
> DID 
> >     get jitted. The overhead is entirely calls to this mystery 
> >     resolve_static_call function, so it looks like a static method 
> lookup 
> >     issue in the JVM. The shape of the stack profile makes it look as if 
> >     something is recursive, too. 
> > 
> > 
> > Are you sure? From the code it certainly looks like 
> > 'resolve_static_call' is part of the interpreter code path. 
> > 
> > -Todd 
> > 
> > 
> >     On 2/4/19 8:01 PM, Todd Lipcon wrote: 
> >      > Tried looking at LogCompilation output with jitwatch? It's been 
> >     helpful 
> >      > for me in the past to understand why something wouldn't get 
> jitted. 
> >      > 
> >      > Todd 
> >      > 
> >      > On Mon, Feb 4, 2019, 7:54 PM Shevek <[email protected] 
> <javascript:> 
> >     <mailto:[email protected] <javascript:>> 
> >      > <mailto:[email protected] <javascript:> <mailto:
> [email protected] <javascript:>>> wrote: 
> >      > 
> >      >     Update: I now think this is slow (but not AS slow) on the 
> >     Core i7-5600U 
> >      >     so this may be a regression from _181 to _191, and not 
> entirely 
> >      >     CPU-dependent? 
> >      > 
> >      >     Wrapping the two static methods in an otherwise-pointless 
> >     class, and 
> >      >     calling them as instance methods made the code much faster. 
> >      > 
> >      >     Is it relevant that the class in question is 522419 Kb in 
> >     size and 
> >      >     contains 1696 (mostly instance) methods? No individual method 
> >     in it is 
> >      >     larger than 8K, so they all JIT. 
> >      > 
> >      >     The outer readVarintTable method is called about 100K-500K 
> >     times, so 
> >      >     there's plenty of chance to replace it. 
> >      > 
> >      >     No synchronization is used. 
> >      > 
> >      >     I'm still in "WAT?" territory. 
> >      > 
> >      >     S. 
> >      > 
> >      >     On 2/4/19 6:26 PM, Shevek wrote: 
> >      >      > Hi, 
> >      >      > 
> >      >      > I have a very simple routine which, on some JVMs/systems, 
> >     which I 
> >      >     have 
> >      >      > not yet entirely narrowed down, suffers a 50x slowdown. 
> >     The code is 
> >      >      > included below. 
> >      >      > 
> >      >      > In perf-java-flames, I see: 
> >      >      > 
> >      >      > <clinit> -> readVarintTable (90%), of which: 
> >      >      > readVarintTable -> readVarint (4%) 
> >      >      > readVarintTable -> resolve_static_call -> libjvm.so (86%) 
> >     <-- THE 
> >      >     WAT? 
> >      >      > 
> >      >      > So what is a perfectly trivial method doing spending 90% 
> >     of it's 
> >      >     time 
> >      >      > inside resolve_static_call? What's going on? 
> >      >      > 
> >      >      > My google searches turned up a note to do with loop 
> >     unrolling and 
> >      >     some 
> >      >      > optimizations breaking for static methods, so I will try 
> >     this with 
> >      >      > non-static methods. 
> >      >      > 
> >      >      > Slow on openjdk 1.8.0_191, Xeon E-2176M (Lenovo P1 laptop, 
> >     12-thread) 
> >      >      > Fast on openjdk 1.8.0_191, Core i7-5600U (Lenovo T550 
> laptop, 
> >      >     4-thread) 
> >      >      > I think Fast on Xeon E5620 (Supermicro rack, 8 thread). 
> >      >      > I think Slow on AMD Epyc 7301 16-core, 64-thread. Will 
> >      >     investigate more. 
> >      >      > 
> >      >      > Knocking off the obvious: 
> >      >      > * It's not doing meaningful amounts of allocation, and no 
> GC. 
> >      >      > * Total data size is 100M-1G. 
> >      >      > * Both machines running same code, same dataset, ... 
> >      >      > * This is single-threaded, and runs early in the JVM 
> startup. 
> >      >      > * It's doing I/O over JAR-resource -> BufferedInputStream 
> -> 
> >      >      > DataInputStream but it's not I/O contended, based on the 
> >     calls in 
> >      >     the 
> >      >      > flamegraph. 
> >      >      > 
> >      >      > But I feel that a 50x slowdown in an unexplained native 
> call 
> >      >     because of 
> >      >      > ... what, the number of cores ... bears some explanation. 
> >     I can 
> >      >     post the 
> >      >      > flamegraphs if that helps. 
> >      >      > 
> >      >      > And here is the code, which is as boring as anything, so 
> >     what gives: 
> >      >      > 
> >      >      > 
> >      >      >      /** Reads a little-endian varint with no optimization 
> for 
> >      >     negative 
> >      >      > numbers. */ 
> >      >      >      private static int readVarint(@Nonnull 
> >     DataInputStream in) 
> >      >     throws 
> >      >      > IOException { 
> >      >      >          int result = 0; 
> >      >      >          for (int shift = 0; shift < 32; shift += 7) { 
> >      >      >              int b = in.read(); 
> >      >      >              if (b == -1) 
> >      >      >                  throw new EOFException("Truncated varint 
> in 
> >      >     stream."); 
> >      >      >              result |= (b & 0x7f) << shift; 
> >      >      >              if ((b & 0x80) == 0) 
> >      >      >                  return result; 
> >      >      >          } 
> >      >      >          throw new IOException("Malformed varint in 
> stream."); 
> >      >      >      } 
> >      >      > 
> >      >      >      @Nonnull 
> >      >      >      private static int[] readVarintTable(@Nonnull 
> >      >     DataInputStream in, 
> >      >      > @Nonnegative int sublength) throws IOException { 
> >      >      >          int[] out = new int[readVarint(in) * sublength]; 
> >      >      >          for (int i = 0; i < out.length; i++) 
> >      >      >              out[i] = readVarint(in); 
> >      >      >          return out; 
> >      >      >      } 
> >      >      > 
> >      >      >      static { 
> >      >      >          try { 
> >      >      >              DataInputStream in = new DataInputStream( 
> >      >      >                  new BufferedInputStream( 
> >      >      >            Parser.class.getResourceAsStream("Parser.dat") 
> >      >      >                  ) 
> >      >      >              ); 
> >      >      >                          table = readVarintTable(in); 
> >      >      >                  } // etc 
> >      >      > 
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to