Re: [drlvm][threading] Potential race condition in safe-point callbacks.

Evgueni Brevnov Tue, 12 Dec 2006 00:14:53 -0800

Nikolay,

I just tried to separate shutdown related from safe-point callback
stuff. No problems were identified so far. So I will file a separate
JIRA for callback related things.


On 12/12/06, Nikolay Kuznetsov <[EMAIL PROTECTED]> wrote:

Evgueni,

sorry for intrusion, at the very last moment, in general I agree with
all mentioned problems and appreciate your work done for shutdown and
suspension enhancements but if you don't mind I have I have few
questions or suggestions:

1) Am I right, that all these scenarios came from shutdown sequence,
I'm asking because, I would discuss some points but by no means I
don't want to stop your shutdown patch which is good in general, but
if those changes can go to the code base separately it would be
preferable.

2) May I ask you to separate safepoint_callback related fixes into
separate JIRA; There were several suggestions which came from Weldon
to improve safepoint callback mechanism which we can incorporate into
single patch and also it would help to review those changes faster and
easier (if it's not so hard, of 'cause).

3) Third problem in your list is a tricky one, it leads to separating
suspend into user requested and system wide, because first one should
be interrupted by stop (according to RI) and second one seems to be
stop proof.


Yes, I was thinking about that as well. It would be nice if we can
find another solution which is easier than differentiation between
"internal" and "external" requests.

Personally I prefer to exclude this test until the problem
is resolved and check your changes in for the sake of safety.


Me too.

By the
way, does shutdown sequence workaround suspended threads correctly?


No. In case of shutdown we can't even reset suspend_request counter in
the callback function because it is external component to TM.

And I would also ask Ivan Popov if such a scenario may be reserved for
JDWP?

Thank you.
  Nik.


Thanks
Evgueni

On 12/11/06, Evgueni Brevnov <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> While working on http://issues.apache.org/jira/browse/HARMONY-2391 I
> found out that current implementation of safe-point callbacks works
> incorrectly/unsafely. Here is a list of problems:
>
> 1) hythread_suspend & hythread_resume are used asynchronously what may
> lead to a deadlock. Here is the scenario:
>     a) T1 wants to set a safe-point callback to T2. So it calls
> hythread_set_safe_point_callback(T2, the_callback).
>     b) T1 sets T2->safepoint_callback to the_callback;
>     c) T2 executes hythread_resume(self) which is under if
> (thread->safepoint_callback) statement.
>     d) T1 calls send_suspend_request(T2). So T2 will never be resumed
> because it already invoked corresponding hythread_resume(self)
> statement on the previous step.
>
> 2) Current implementation sets suspend_disable_count to 1
> (thread_native_suspend.c:162) and allows execution of unsafe code
> (which must be executed under suspend disabled state) while GC may be
> working.....
>
> 3) In stop_callback (thread_java_basic.c:413) suspend_request for the
> current thread is set to zero. So the thread just ignores
> suspend_requests and continue to do its dirty things.
>
> All these problems are fixed in the HARMONY-2391. But
> org.apache.harmony.luni.tests.java.lang.ThreadGroupTest.test_suspend
> starts to fail. The scenario is as follows:
>
> 1) T1 suspends T2 (so the suspend_request is >0).
> 2) T1 stops T2 by setting up the stop_callback to T2.
> 3) T2 needs to switch to suspend disabled state to be able to throw
> exception but it can't do that because of step 1)
>
> From the one hand, If I rollback my changes for the 3) problem
> described in the first list the this test works fine. From the other
> hand, I can't do it because it may lead to a crash if GC is really
> working at the time I want to throw an exception. I don't see how to
> fix it in an easy way right now.... but want to proceed with
> HARMONY-2391.
>
> What would you recommend to do?
> 1) Commit the HARMONY-2391 patch as is. File a JIRA regarding failing
> case. Exclude the test until the bug is fixed.
> 2) Commit the HARMONY-2391 patch with 3) undone. File a JIRA. In this
> case it is possible to have intermittent failures until the problem is
> not fixed.
>
> Thanks
> Evgueni
>

Re: [drlvm][threading] Potential race condition in safe-point callbacks.

Reply via email to