Considering stack overflow as fatal errors makes sense for
JVM running single applications. This could be the subject
of a RFE, the feature is well defined and the implementation
should not be too complex.

However, JEP-270 has been designed with multi-tenant applications
in mind. In this context, we'd like to avoid having to crash the
VM and restart the application and all tenants because one
tenant had a misbehaving thread. The reserved stack area is used
to protect the critical locks of the host application, in order
to give it a chance to cleanly kill the problematic tenant without
impacting the others.

Regards,

Fred

On 24/11/2015 19:16, Steven Schlansker wrote:

On Nov 24, 2015, at 8:46 AM, Karen Kinnear <karen.kinn...@oracle.com> wrote:

Doug,

I have been thinking about this more from the perspective of the original 
problem
we set out to solve

I apologize if this has already been considered -- but for a lot well designed 
systems,
occasional application failure is an expected fact of life and we design our HA 
around
this with automatic restarts and monitoring.

If it is so hard to detect / resolve a stack overflow situation, maybe one 
useful
mitigation of such awful situations (juc hangs, corrupt state, lost locks) 
would be to
actually treat a stack overflow as a fatal condition, much like 
OutOfMemoryError.

In fact, we configure all of our production servers with the moral equivalent of
-XX:OnOutOfMemoryError="kill -9 %p"
because once we are in a possibly inconsistent state, we would much rather nuke
it from orbit and start over.

Maybe introducing some new options, like
-XX:OnStackOverflowError=
or
-XX:TreatStackOverflowAsOOM (piggyback on the existing tunable above)
would allow end users to avoid the really bad behavior in a controllable way?


, which was identified in the concurrent hash map usage, at the
time in the class loading logic. While the class loading logic has changed, I 
think we
have enough experience with this particular example and have studied
the code constructs sufficiently that there is value in checking in the small 
set of
JDK changes that target that situation. I also think this gives a sample of
the kind of model in which this approach can be effective. In addition, having 
this small set of
changes provides the ability to test and ensure that the hotspot changes 
continue to
work.

So I would like to recommend that we go ahead and check in the hotspot changes
and the initial minimal set of j.u.c. updates as a way to put the new mechanism
in place so that the people with more domain expertise in the 
java.util.concurrent
libraries can experiment with the mechanism and add incremental improvements.

thanks,
Karen

On Nov 22, 2015, at 7:04 PM, Doug Lea <d...@cs.oswego.edu> wrote:

On 11/20/2015 12:40 PM, Karen Kinnear wrote:
Totally appreciate the suggestion that the java.util.concurrent modifications
be done by folks with more domain expertise.

Would you have us incorporate the initial minimal set of j.u.c. updates or none
at all?

Sorry that I'm still in foot-drag mode on this.
Reading David and Fred's exchanges reinforce my thoughts
that there is no defensible rule or approach to
use @ReservedStackAccess so as to add as little time and
space as possible to reduce the occurrence of stuck
resources as much as possible during StackOverflowError.

After googling "StackOverflowError java util concurrent" and seeing the
range of situations that can be encountered, I don't even know
which kinds of constructions to target.
And I'm less sure whether using @ReservedStackAccess at all
is better than doing nothing.

Maybe there is some decent empirical strategy, but I can't
tell until hotspot support of @ReservedStackAccess is in place.
So my vote is still to keep the JDK changes out for now.

-Doug




--
Frederic Parain - Oracle
Grenoble Engineering Center - France
Phone: +33 4 76 18 81 17
Email: frederic.par...@oracle.com

Reply via email to