Looks like you run long computation inside the static initializer <clinit>, right? The problem is that HotSpot cannot fully optimize static calls when the holder class is not completely initialized. Moving computation to a separate class is the right way to deal with such cases.
The situation got even worse in recent JDK updates. There was a zero day bug https://bugs.openjdk.java.net/browse/JDK-8215634 that could result in invocation of a static method of uninitialized class in violation of JVMS. The bug was fixed in JDK 8u201 and JDK 11.0.2. However, the fix is terrible from performance perspective: now the invocation of a static method of uninitialized class goes through the slow path, in particular, through resolve_static_call() JVM runtime function. So, the recommendation is simple: move heavy logic out of a class being initialized. - Andrei четверг, 7 февраля 2019 г., 1:54:29 UTC+3 пользователь Shevek написал: > > Hmm... "Justify/explain your beliefs." Fair, I'm game: > > I think I'm sure that my methods are both JIT'd because perf-java-flames > shows them both in green, and I think that's derived from some prefix on > the symbol name where the JVM (via perf) says it's JIT'd. If it weren't > JIT'd, I'd see it listed in red as "Interpreter". > > I waited for the system to warm up before grabbing the flame-chart, and > on one run, I was too early, so I actually saw one symbol shift from > interpreted to JIT. I have not used jitwatch, but the signs I am looking > at seem reliable. The interpreter JIT'd the inner method first, then the > outer one, as expected based on call frequency. > > I've never seen this resolve_static_call method show up before in any of > my profiling of Java code. > > From that PoV, it looks as if the interpreter is doing something odd > with method call between the two methods, despite that it should just be > a static linkage. Like it JIT'd both methods, but didn't JIT the call, > or it's using some interpreter-path code in the call? So is there a > linkage limit based on #methods in the class, or something? Because > making the methods non-static ALSO moved them into a new class which has > only 2 methods... ok, new experiment... > > For what it's worth, I seem to be really struggling with JDK 1.8.1b191 > performance on the new Xeon hardware for other reasons, too. I'm seeing > perf saying pthread_cond_wait -> native_write_msr is taking 50% of > runtime, and not even sure where to start with that except limit the > life of any JVM to 6 hours and restart it. I kind of want to blame a > Kernel / PMU change but it only affects the JVM. > > Caveat: I don't do JVM internals, I'm mostly a JLS-layer muggle. > > S. > > On 2/4/19 10:53 PM, Todd Lipcon wrote: > > On Mon, Feb 4, 2019 at 9:13 PM Shevek <[email protected] <javascript:> > > <mailto:[email protected] <javascript:>>> wrote: > > > > This isn't a JIT issue. According to perf-java-flames, all my code > DID > > get jitted. The overhead is entirely calls to this mystery > > resolve_static_call function, so it looks like a static method > lookup > > issue in the JVM. The shape of the stack profile makes it look as if > > something is recursive, too. > > > > > > Are you sure? From the code it certainly looks like > > 'resolve_static_call' is part of the interpreter code path. > > > > -Todd > > > > > > On 2/4/19 8:01 PM, Todd Lipcon wrote: > > > Tried looking at LogCompilation output with jitwatch? It's been > > helpful > > > for me in the past to understand why something wouldn't get > jitted. > > > > > > Todd > > > > > > On Mon, Feb 4, 2019, 7:54 PM Shevek <[email protected] > <javascript:> > > <mailto:[email protected] <javascript:>> > > > <mailto:[email protected] <javascript:> <mailto: > [email protected] <javascript:>>> wrote: > > > > > > Update: I now think this is slow (but not AS slow) on the > > Core i7-5600U > > > so this may be a regression from _181 to _191, and not > entirely > > > CPU-dependent? > > > > > > Wrapping the two static methods in an otherwise-pointless > > class, and > > > calling them as instance methods made the code much faster. > > > > > > Is it relevant that the class in question is 522419 Kb in > > size and > > > contains 1696 (mostly instance) methods? No individual method > > in it is > > > larger than 8K, so they all JIT. > > > > > > The outer readVarintTable method is called about 100K-500K > > times, so > > > there's plenty of chance to replace it. > > > > > > No synchronization is used. > > > > > > I'm still in "WAT?" territory. > > > > > > S. > > > > > > On 2/4/19 6:26 PM, Shevek wrote: > > > > Hi, > > > > > > > > I have a very simple routine which, on some JVMs/systems, > > which I > > > have > > > > not yet entirely narrowed down, suffers a 50x slowdown. > > The code is > > > > included below. > > > > > > > > In perf-java-flames, I see: > > > > > > > > <clinit> -> readVarintTable (90%), of which: > > > > readVarintTable -> readVarint (4%) > > > > readVarintTable -> resolve_static_call -> libjvm.so (86%) > > <-- THE > > > WAT? > > > > > > > > So what is a perfectly trivial method doing spending 90% > > of it's > > > time > > > > inside resolve_static_call? What's going on? > > > > > > > > My google searches turned up a note to do with loop > > unrolling and > > > some > > > > optimizations breaking for static methods, so I will try > > this with > > > > non-static methods. > > > > > > > > Slow on openjdk 1.8.0_191, Xeon E-2176M (Lenovo P1 laptop, > > 12-thread) > > > > Fast on openjdk 1.8.0_191, Core i7-5600U (Lenovo T550 > laptop, > > > 4-thread) > > > > I think Fast on Xeon E5620 (Supermicro rack, 8 thread). > > > > I think Slow on AMD Epyc 7301 16-core, 64-thread. Will > > > investigate more. > > > > > > > > Knocking off the obvious: > > > > * It's not doing meaningful amounts of allocation, and no > GC. > > > > * Total data size is 100M-1G. > > > > * Both machines running same code, same dataset, ... > > > > * This is single-threaded, and runs early in the JVM > startup. > > > > * It's doing I/O over JAR-resource -> BufferedInputStream > -> > > > > DataInputStream but it's not I/O contended, based on the > > calls in > > > the > > > > flamegraph. > > > > > > > > But I feel that a 50x slowdown in an unexplained native > call > > > because of > > > > ... what, the number of cores ... bears some explanation. > > I can > > > post the > > > > flamegraphs if that helps. > > > > > > > > And here is the code, which is as boring as anything, so > > what gives: > > > > > > > > > > > > /** Reads a little-endian varint with no optimization > for > > > negative > > > > numbers. */ > > > > private static int readVarint(@Nonnull > > DataInputStream in) > > > throws > > > > IOException { > > > > int result = 0; > > > > for (int shift = 0; shift < 32; shift += 7) { > > > > int b = in.read(); > > > > if (b == -1) > > > > throw new EOFException("Truncated varint > in > > > stream."); > > > > result |= (b & 0x7f) << shift; > > > > if ((b & 0x80) == 0) > > > > return result; > > > > } > > > > throw new IOException("Malformed varint in > stream."); > > > > } > > > > > > > > @Nonnull > > > > private static int[] readVarintTable(@Nonnull > > > DataInputStream in, > > > > @Nonnegative int sublength) throws IOException { > > > > int[] out = new int[readVarint(in) * sublength]; > > > > for (int i = 0; i < out.length; i++) > > > > out[i] = readVarint(in); > > > > return out; > > > > } > > > > > > > > static { > > > > try { > > > > DataInputStream in = new DataInputStream( > > > > new BufferedInputStream( > > > > Parser.class.getResourceAsStream("Parser.dat") > > > > ) > > > > ); > > > > table = readVarintTable(in); > > > > } // etc > > > > > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
