Re: How badly do JFR stack traces lie?

Richard Warburton Sun, 03 Dec 2017 04:15:54 -0800

Hi,

I'm glad being lazy in replying to this email thread has lead to Nitsan
making most of the points I was going to make and some more that I would
have missed ;)


I will also add to this that the underlying code that JFR uses I believe is
a bit different to AsyncGetCallTrace, even though its taking a similar
approach of using sigprof signals to interrupt the program and collect
stack traces without safepointing. I've noticed a couple of occassions when
profiling things where JFR has misattributed time taken even with
-XX:+DebugNonSafepoints switched on that Honest Profiler didn't. I suspect
that async-profiler and Oracle Studio would both do as well as Honest
Profiler in this regard, though I didn't try them. I've not seen it enough
times to really nail down when/why it happens but if you try to profile
something network IO heavy it can sometimes miss a lot of the time
sending/receiving from the network. I've also seen it happen when profiling
an application that was doing large quantities of compression where the
Java code was calling into a native library to perform the compression. It
underestimated the time in this code path.

On Sun, Dec 3, 2017 at 11:38 AM, Nitsan Wakart <[email protected]> wrote:

> So, as apangin points out there's an issue where JFR cannot walk the stack
> safely. To add insult to injury, JFR does not report failed samples at all,
> which results in a systematic omission of certain methods from the profile.
> This is a massive reporting issue in my opinion, and has not been fixed in
> JDK 9. I have discussed it with members of the JFR team, hopefully it will
> be fixed in the near future. Honest Profiler and Async-Profiler are both
> significantly better in that regard.
> On top of that, JFR and other Java level profilers, rely on the debug
> information provided by the JVM to help translate the sampled program
> counter(PC)to a Java bytecode and relevant stack trace. The debug
> information provided by default is quite sparse, and can be greatly
> improved by -XX:+DebugNonSafepoints. Even after that, the translation can
> be lacking, or misleading. This is due in part to certain compiler
> optimisations not creating the relevant mapping information. In any case
> the information is often incomplete, and where a mapping is not available
> the nearest mapping is taken(e.g. No bytecode is associated with PC, but
> there's a mapping available for PC+10, so report that BCI). The mapping of
> a single instruction to a BCI is also at times incorrect as the PC is in
> fact the combined result of many bytecodes.
> Further more, instruction profiling itself suffers from certain
> inaccuracies leading to the reported PC to be a few(normally 1-10, but on
> rare occasions allot more) instructions after the instruction where most of
> the actual cost is.
> The above complications are compounded by method in lining, which results
> in the compiler mixing up code from several methods together to generate a
> single 'real method'. So where before you could skid a few instructions,
> map to the wrong BCI, but still end up looking at the right method, with
> inlining you can easily skid between lines in different methods.
> This issue exists in all profilers when reporting a Java line of code or
> method. Instruction level profilers will show the skid within the context
> of the real method, but will provide enough context IME for you to find the
> root cause.
> The potential for error is quite large, and does happen. It does not
> render JFR useless, but it helps to be aware of the above and reach for
> other tools(honest-profiler, async-profiler, perf, Oracle Studio, VTune
> etc) when the data seems suspect. Definitely start by enabling
> DebugNonSafepoints.
>
>
> > On 2 Dec 2017, at 07:29, Remko Popma <[email protected]> wrote:
> >
> > For background, see https://stackoverflow.com/q/47590263/1446916
> >
> > Apangin’s answer seems plausible, would like to hear insights from
> people on this list.
> >
> > Remko
> >
> > --
> > You received this message because you are subscribed to the Google
> Groups "mechanical-sympathy" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
regards,

  Richard Warburton

  http://insightfullogic.com
  @RichardWarburto <http://twitter.com/richardwarburto>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: How badly do JFR stack traces lie?

Reply via email to