Re: Huge, unexpected performance overhead (of static methods?)

Francesco Nigro Thu, 07 Feb 2019 23:59:22 -0800

I suspect that the build phase of an app using Graal AOT could suffer the
same limitation, but it is correct that static initializers *shouldn't* be
used for heavyweight computations given that are ensured to be called
serially...


Il giorno ven 8 feb 2019, 02:36 Andrei Pangin <[email protected]> ha
scritto:

> Looks like you run long computation inside the static initializer
> <clinit>, right?
> The problem is that HotSpot cannot fully optimize static calls when the
> holder class is not completely initialized. Moving computation to a
> separate class is the right way to deal with such cases.
>
> The situation got even worse in recent JDK updates. There was a zero day
> bug https://bugs.openjdk.java.net/browse/JDK-8215634 that could result in
> invocation of a static method of uninitialized class in violation of JVMS.
> The bug was fixed in JDK 8u201 and JDK 11.0.2. However, the fix is terrible
> from performance perspective: now the invocation of a static method of
> uninitialized class goes through the slow path, in particular, through
> resolve_static_call() JVM runtime function.
>
> So, the recommendation is simple: move heavy logic out of a class being
> initialized.
>
> - Andrei
>
> четверг, 7 февраля 2019 г., 1:54:29 UTC+3 пользователь Shevek написал:
>>
>> Hmm... "Justify/explain your beliefs." Fair, I'm game:
>>
>> I think I'm sure that my methods are both JIT'd because perf-java-flames
>> shows them both in green, and I think that's derived from some prefix on
>> the symbol name where the JVM (via perf) says it's JIT'd. If it weren't
>> JIT'd, I'd see it listed in red as "Interpreter".
>>
>> I waited for the system to warm up before grabbing the flame-chart, and
>> on one run, I was too early, so I actually saw one symbol shift from
>> interpreted to JIT. I have not used jitwatch, but the signs I am looking
>> at seem reliable. The interpreter JIT'd the inner method first, then the
>> outer one, as expected based on call frequency.
>>
>> I've never seen this resolve_static_call method show up before in any of
>> my profiling of Java code.
>>
>>  From that PoV, it looks as if the interpreter is doing something odd
>> with method call between the two methods, despite that it should just be
>> a static linkage. Like it JIT'd both methods, but didn't JIT the call,
>> or it's using some interpreter-path code in the call? So is there a
>> linkage limit based on #methods in the class, or something? Because
>> making the methods non-static ALSO moved them into a new class which has
>> only 2 methods... ok, new experiment...
>>
>> For what it's worth, I seem to be really struggling with JDK 1.8.1b191
>> performance on the new Xeon hardware for other reasons, too. I'm seeing
>> perf saying pthread_cond_wait -> native_write_msr is taking 50% of
>> runtime, and not even sure where to start with that except limit the
>> life of any JVM to 6 hours and restart it. I kind of want to blame a
>> Kernel / PMU change but it only affects the JVM.
>>
>> Caveat: I don't do JVM internals, I'm mostly a JLS-layer muggle.
>>
>> S.
>>
>> On 2/4/19 10:53 PM, Todd Lipcon wrote:
>> > On Mon, Feb 4, 2019 at 9:13 PM Shevek <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     This isn't a JIT issue. According to perf-java-flames, all my code
>> DID
>> >     get jitted. The overhead is entirely calls to this mystery
>> >     resolve_static_call function, so it looks like a static method
>> lookup
>> >     issue in the JVM. The shape of the stack profile makes it look as
>> if
>> >     something is recursive, too.
>> >
>> >
>> > Are you sure? From the code it certainly looks like
>> > 'resolve_static_call' is part of the interpreter code path.
>> >
>> > -Todd
>> >
>> >
>> >     On 2/4/19 8:01 PM, Todd Lipcon wrote:
>> >      > Tried looking at LogCompilation output with jitwatch? It's been
>> >     helpful
>> >      > for me in the past to understand why something wouldn't get
>> jitted.
>> >      >
>> >      > Todd
>> >      >
>> >      > On Mon, Feb 4, 2019, 7:54 PM Shevek <[email protected]
>> >     <mailto:[email protected]>
>> >      > <mailto:[email protected] <mailto:[email protected]>> wrote:
>> >      >
>> >      >     Update: I now think this is slow (but not AS slow) on the
>> >     Core i7-5600U
>> >      >     so this may be a regression from _181 to _191, and not
>> entirely
>> >      >     CPU-dependent?
>> >      >
>> >      >     Wrapping the two static methods in an otherwise-pointless
>> >     class, and
>> >      >     calling them as instance methods made the code much faster.
>> >      >
>> >      >     Is it relevant that the class in question is 522419 Kb in
>> >     size and
>> >      >     contains 1696 (mostly instance) methods? No individual
>> method
>> >     in it is
>> >      >     larger than 8K, so they all JIT.
>> >      >
>> >      >     The outer readVarintTable method is called about 100K-500K
>> >     times, so
>> >      >     there's plenty of chance to replace it.
>> >      >
>> >      >     No synchronization is used.
>> >      >
>> >      >     I'm still in "WAT?" territory.
>> >      >
>> >      >     S.
>> >      >
>> >      >     On 2/4/19 6:26 PM, Shevek wrote:
>> >      >      > Hi,
>> >      >      >
>> >      >      > I have a very simple routine which, on some JVMs/systems,
>> >     which I
>> >      >     have
>> >      >      > not yet entirely narrowed down, suffers a 50x slowdown.
>> >     The code is
>> >      >      > included below.
>> >      >      >
>> >      >      > In perf-java-flames, I see:
>> >      >      >
>> >      >      > <clinit> -> readVarintTable (90%), of which:
>> >      >      > readVarintTable -> readVarint (4%)
>> >      >      > readVarintTable -> resolve_static_call -> libjvm.so (86%)
>> >     <-- THE
>> >      >     WAT?
>> >      >      >
>> >      >      > So what is a perfectly trivial method doing spending 90%
>> >     of it's
>> >      >     time
>> >      >      > inside resolve_static_call? What's going on?
>> >      >      >
>> >      >      > My google searches turned up a note to do with loop
>> >     unrolling and
>> >      >     some
>> >      >      > optimizations breaking for static methods, so I will try
>> >     this with
>> >      >      > non-static methods.
>> >      >      >
>> >      >      > Slow on openjdk 1.8.0_191, Xeon E-2176M (Lenovo P1
>> laptop,
>> >     12-thread)
>> >      >      > Fast on openjdk 1.8.0_191, Core i7-5600U (Lenovo T550
>> laptop,
>> >      >     4-thread)
>> >      >      > I think Fast on Xeon E5620 (Supermicro rack, 8 thread).
>> >      >      > I think Slow on AMD Epyc 7301 16-core, 64-thread. Will
>> >      >     investigate more.
>> >      >      >
>> >      >      > Knocking off the obvious:
>> >      >      > * It's not doing meaningful amounts of allocation, and no
>> GC.
>> >      >      > * Total data size is 100M-1G.
>> >      >      > * Both machines running same code, same dataset, ...
>> >      >      > * This is single-threaded, and runs early in the JVM
>> startup.
>> >      >      > * It's doing I/O over JAR-resource -> BufferedInputStream
>> ->
>> >      >      > DataInputStream but it's not I/O contended, based on the
>> >     calls in
>> >      >     the
>> >      >      > flamegraph.
>> >      >      >
>> >      >      > But I feel that a 50x slowdown in an unexplained native
>> call
>> >      >     because of
>> >      >      > ... what, the number of cores ... bears some explanation.
>> >     I can
>> >      >     post the
>> >      >      > flamegraphs if that helps.
>> >      >      >
>> >      >      > And here is the code, which is as boring as anything, so
>> >     what gives:
>> >      >      >
>> >      >      >
>> >      >      >      /** Reads a little-endian varint with no
>> optimization for
>> >      >     negative
>> >      >      > numbers. */
>> >      >      >      private static int readVarint(@Nonnull
>> >     DataInputStream in)
>> >      >     throws
>> >      >      > IOException {
>> >      >      >          int result = 0;
>> >      >      >          for (int shift = 0; shift < 32; shift += 7) {
>> >      >      >              int b = in.read();
>> >      >      >              if (b == -1)
>> >      >      >                  throw new EOFException("Truncated varint
>> in
>> >      >     stream.");
>> >      >      >              result |= (b & 0x7f) << shift;
>> >      >      >              if ((b & 0x80) == 0)
>> >      >      >                  return result;
>> >      >      >          }
>> >      >      >          throw new IOException("Malformed varint in
>> stream.");
>> >      >      >      }
>> >      >      >
>> >      >      >      @Nonnull
>> >      >      >      private static int[] readVarintTable(@Nonnull
>> >      >     DataInputStream in,
>> >      >      > @Nonnegative int sublength) throws IOException {
>> >      >      >          int[] out = new int[readVarint(in) * sublength];
>> >      >      >          for (int i = 0; i < out.length; i++)
>> >      >      >              out[i] = readVarint(in);
>> >      >      >          return out;
>> >      >      >      }
>> >      >      >
>> >      >      >      static {
>> >      >      >          try {
>> >      >      >              DataInputStream in = new DataInputStream(
>> >      >      >                  new BufferedInputStream(
>> >      >      >            Parser.class.getResourceAsStream("Parser.dat")
>> >      >      >                  )
>> >      >      >              );
>> >      >      >                          table = readVarintTable(in);
>> >      >      >                  } // etc
>> >      >      >
>>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: Huge, unexpected performance overhead (of static methods?)

Reply via email to