Hi, I'm glad being lazy in replying to this email thread has lead to Nitsan making most of the points I was going to make and some more that I would have missed ;)
I will also add to this that the underlying code that JFR uses I believe is a bit different to AsyncGetCallTrace, even though its taking a similar approach of using sigprof signals to interrupt the program and collect stack traces without safepointing. I've noticed a couple of occassions when profiling things where JFR has misattributed time taken even with -XX:+DebugNonSafepoints switched on that Honest Profiler didn't. I suspect that async-profiler and Oracle Studio would both do as well as Honest Profiler in this regard, though I didn't try them. I've not seen it enough times to really nail down when/why it happens but if you try to profile something network IO heavy it can sometimes miss a lot of the time sending/receiving from the network. I've also seen it happen when profiling an application that was doing large quantities of compression where the Java code was calling into a native library to perform the compression. It underestimated the time in this code path. On Sun, Dec 3, 2017 at 11:38 AM, Nitsan Wakart <[email protected]> wrote: > So, as apangin points out there's an issue where JFR cannot walk the stack > safely. To add insult to injury, JFR does not report failed samples at all, > which results in a systematic omission of certain methods from the profile. > This is a massive reporting issue in my opinion, and has not been fixed in > JDK 9. I have discussed it with members of the JFR team, hopefully it will > be fixed in the near future. Honest Profiler and Async-Profiler are both > significantly better in that regard. > On top of that, JFR and other Java level profilers, rely on the debug > information provided by the JVM to help translate the sampled program > counter(PC)to a Java bytecode and relevant stack trace. The debug > information provided by default is quite sparse, and can be greatly > improved by -XX:+DebugNonSafepoints. Even after that, the translation can > be lacking, or misleading. This is due in part to certain compiler > optimisations not creating the relevant mapping information. In any case > the information is often incomplete, and where a mapping is not available > the nearest mapping is taken(e.g. No bytecode is associated with PC, but > there's a mapping available for PC+10, so report that BCI). The mapping of > a single instruction to a BCI is also at times incorrect as the PC is in > fact the combined result of many bytecodes. > Further more, instruction profiling itself suffers from certain > inaccuracies leading to the reported PC to be a few(normally 1-10, but on > rare occasions allot more) instructions after the instruction where most of > the actual cost is. > The above complications are compounded by method in lining, which results > in the compiler mixing up code from several methods together to generate a > single 'real method'. So where before you could skid a few instructions, > map to the wrong BCI, but still end up looking at the right method, with > inlining you can easily skid between lines in different methods. > This issue exists in all profilers when reporting a Java line of code or > method. Instruction level profilers will show the skid within the context > of the real method, but will provide enough context IME for you to find the > root cause. > The potential for error is quite large, and does happen. It does not > render JFR useless, but it helps to be aware of the above and reach for > other tools(honest-profiler, async-profiler, perf, Oracle Studio, VTune > etc) when the data seems suspect. Definitely start by enabling > DebugNonSafepoints. > > > > On 2 Dec 2017, at 07:29, Remko Popma <[email protected]> wrote: > > > > For background, see https://stackoverflow.com/q/47590263/1446916 > > > > Apangin’s answer seems plausible, would like to hear insights from > people on this list. > > > > Remko > > > > -- > > You received this message because you are subscribed to the Google > Groups "mechanical-sympathy" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- regards, Richard Warburton http://insightfullogic.com @RichardWarburto <http://twitter.com/richardwarburto> -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
