Re: [jvm-l] Improving the performance of stacktrace generation

Charles Oliver Nutter Thu, 11 Apr 2013 10:13:40 -0700

I talked a bit with John Rose about this, and he agreed with me that a
good partial measure might be to add APIs for getting a *partial*
stack.


Currently, Hotspot will limit how deep a stack trace it generates.
This can have a very large impact on the performance of generating
traces.

The magic flag is -XX:MaxJavaStackTraceDepth=####, and the default on
my system is 1024. Here's a set of benchmarks of various trace depths
from 1000 down to 2. Once you get down to 100 frames, performance of
generating a stack trace starts to improve considerably.

https://gist.github.com/headius/5365217

Unfortunately there's no API to get just a partial stack trace, via
JVMTI or otherwise. The relevant code in Hotspot itself is rather
simple; I started prototyping a JNI call that would allow getting a
partial trace. Perhaps something like:

thread.getStackTrace(depth)

...and something equivalent for JVMTI.

John agreed that this would be a worthwhile feature for a JEP, and I'd
certainly like to see it trickle into a standard API too.

- Charlie

On Thu, Apr 11, 2013 at 3:37 AM,  <william.lo...@jinspired.com> wrote:
> Hi Bob,
>
> I wrote an article last year on the cost and impact of JVMTI stack collection.
>
> http://www.jinspired.com/site/is-jvm-call-stack-sampling-suitable-for-monitoring-low-latency-trading-apps
>
> I would prefer to see the JVM come up with a standard API and mechanism to 
> allow the stack to be augmented with additional frames that not only include 
> Java code but more contextual information related to executing activity 
> (code, block, flow,....) this would include other JVM languages.
>
> We provide this sort of thing already today for Java, JRuby/Ruby and 
> Jython/Python, even SQL, in our metering engine but would welcome an ability 
> to replicate this data to the VM itself so standard tools need not be 
> changed. What is cool about this is that we can simulate a stack in a remote 
> JVM that spans multiple real application runtimes.
>
> http://www.jinspired.com/site/jxinsight-opencore-6-4-ea-12-released
>
> Kind regards,
>
> William
>
>>-----Original Message-----
>>From: Bob Foster [mailto:bobfos...@gmail.com]
>>Sent: Sunday, July 8, 2012 01:32 AM
>>To: jvm-langua...@googlegroups.com
>>Cc: 'Da Vinci Machine Project'
>>Subject: Re: [jvm-l] Improving the performance of stacktrace generation
>>
>>> Any thoughts on this? Does anyone else have need for
>>lighter-weight name/file/line inspection of the call stack?
>>
>>Well, yes. Profilers do.
>>
>>Recall Cliff Click bragging a couple of years ago at the JVM Language
>>Summit about how fast stack trace generation is in Azul Systems' OSs...and
>>knocking Hotspot for being so slow. It turns out that stack trace
>>generation is a very significant overhead in profiling Hotspot using JVMTI.
>>Even CPU sampling on 20 ms. intervals can add 3% or more to execution time,
>>almost entirely due to the delay in reaching a safe point (which also
>>guarantees the profile will be incorrect) and generating a stack trace for
>>each thread.
>>
>>But 3% is peanuts compared to the cost of memory profiling, which can
>>require a stack trace on every new instance creation. In a profiler I wrote
>>using JVMTI, I discovered that it was faster to call into JNI code on every
>>method entry and exit (and exception catch), keeping a stack trace
>>dynamically than to call into JNI only when memory was allocated and
>>request a stack trace each time. The "fast" technique is about 3-10 times
>>slower than running without profiling. The Netbeans profiler doesn't use
>>this optimization, and its memory profiler when capturing every allocation,
>>as I did, is 2-3 ORDERS OF MAGNITUDE slower than normal (non-server)
>>execution.
>>
>>Faster stack traces would benefit the entire Hotspot profiling community.
>>
>>Bob
>>
>>On Sat, Jul 7, 2012 at 3:03 PM, Charles Oliver Nutter
>><head...@headius.com>wrote:
>>
>>> Today I have a new conundrum for you all: I need stack trace
>>> generation on Hotspot to be considerably faster than it is now.
>>>
>>> In order to simulate many Ruby features, JRuby (over)uses Java stack
>>> traces. We recently (JRuby 1.6, about a year ago) moved to using the
>>> Java stack trace as the source of our Ruby backtrace information,
>>> mining out compiled frames and using interpreter markers to peel off
>>> interpreter frames. The result is that a Ruby trace with mixed
>>> compiled and interpreted code like this
>>> (https://gist.github.com/3068210) turns into this
>>> (https://gist.github.com/3068213). I consider this a great deal better
>>> than the plain Java trace, and I know other language implementers have
>>> lamented the verbosity of stack traces coming out of their languages.
>>>
>>> The unfortunate thing is that stack trace generation is very expensive
>>> in the JVM, and in order to generate normal exceptions and emulate
>>> other features we sometimes generate a lot of them. I think there's
>>> value in exploring how we can make stack trace generation cheaper at
>>> the JVM level.
>>>
>>> Here's a few cases in Ruby where we need to use Java stack traces to
>>> provide the same features:
>>>
>>> * Exceptions as non-exceptional or moderately-exceptional method results
>>>
>>> In this case I'm specifically thinking about Ruby's tendency to
>>> propagate errno values as exceptions; EAGAIN/EWOULDBLOCK for example
>>> are thrown from nonblocking IO methods when there's no data available.
>>>
>>> You will probably say "that's a horrible use for exceptions" and I
>>> agree. But there are a couple reasons why it's nicer too:
>>> - using return value sigils requires you to propagate them back out
>>> through many levels of calls
>>> - exception-handling is cleaner in code than having all your errno
>>> handling logic spliced into regular program flow
>>>
>>> In any case, the cost of generating a stack trace for potentially
>>> every non-blocking IO call is obviously too high. In JRuby, we default
>>> to having EAGAIN/EWOULDBLOCK exceptions not generate a stack trace,
>>> and you must pass a flag for them to do so. The justification is that
>>> these exceptions are almost always used to branch back to the top of a
>>> nonblocking IO loop, so the backtrace is useless.
>>>
>>> * Getting the current or previous method's name/file/line
>>>
>>> Ruby supports a number of features that allow you to get basic
>>> information about the method currently being executed or the method
>>> that called it. The most general of these features is the "caller"
>>> method, which provides an array of all method name + file + line that
>>> would appear in a stack trace at this point. This feature is often
>>> abused to get only the current or previous frame, and so in Ruby 1.9
>>> they added __method__ to get the currently-executing method's
>>> name+file+line.
>>>
>>> In both cases, we must generate a full Java trace for these methods
>>> because the name of a method body is not necessarily statically known.
>>> We often want only the current frame or the current and previous
>>> frames, but we pay the cost of generating an entire Java stack trace
>>> to get them.
>>>
>>> * Warnings that actually report the line of code that triggered them
>>>
>>> In Ruby, it is possible to generate non-fatal warnings to stderr. In
>>> many cases, these warnings automatically include the file and line at
>>> which the triggering code lives. Because the warning logic is
>>> downstream from the Ruby code, we again must use a full Java stack
>>> trace to find the most recent (on stack) Ruby frame. This causes
>>> warnings to be as expensive as regular exceptions.
>>>
>>> Because the use of frame introspection (in this case through stack
>>> traces) has largely been ignored on current JVMs, I suspect there's a
>>> lot of improvement possible. At a minimum, the ability to only grab
>>> the top N frames from the stack trace could be a great improvement
>>> (Hotspot even has flags to restrict how large a trace it will
>>> generate, presumably to avoid the cost of accounting for deep stacks
>>> and generating traces from them).
>>>
>>> Any thoughts on this? Does anyone else have need for lighter-weight
>>> name/file/line inspection of the call stack?
>>>
>>> - Charlie
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "JVM Languages" group.
>>> To post to this group, send email to jvm-langua...@googlegroups.com.
>>> To unsubscribe from this group, send email to
>>> jvm-languages+unsubscr...@googlegroups.com.
>>> For more options, visit this group at
>>> http://groups.google.com/group/jvm-languages?hl=en.
>>>
>>>
>>
>
>
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Re: [jvm-l] Improving the performance of stacktrace generation

Reply via email to