EarlyTimeout.java failed

Peter Levart Thu, 27 Mar 2014 05:39:46 -0700

Hi Ivan,

On 03/27/2014 08:26 AM, Ivan Gerasimov wrote:

David, Mandy, thank you for comments!
Here's what we want to achieve in the test:
Two EarlyTimeout child threads should both be blocked inremove(TIMEOUT) at the moment the weakReference gets enqueued.
This is the situation, when the bug 6853696 manifests itself.
If we made sure the weakReference is enqueued before remove() iscalled by the child threads, then they both would wait for specifiedamount of time even without the fix.
That's why we make the main thread wait for the children to startbefore calling System.gc().
I understand it may seem strange at the first glance.
But we want the child thread to be inside the remove() function whenthe main thread is unblocked, that's why it is done this way.
Unfortunately, we cannot easily make sure that remove() was called bythe child threads at the moment the GC decides to enqueue theweakReference.

There could be a little delay (say half the timeout: 500ms) specifiedafter main thread returns from the startedSignal.await(); and beforesetting referent = null; and doing System.gc(). This would decrease thechance that the reference is enqueued before EarlyTimeout threads enterqueue.remove(1000), thus making the test more reliable in failing withunpatched code.

Now even if the referent is released and System.gc() is called, thatdoes not guarantee that a WeakReference is going to be enqueued beforethe EarlyTimeout threads timeout and the result could as well be 0collected references. To increase the chance that the reference isenqueued in a timely manner, main thread could, immediately afterSystem.gc(), call:


SharedSecrets.getJavaLangRefAccess().tryHandlePendingReference();

(since SharedSecrets is in sun.misc protected package, JavaLangRefAccessinstance would have to be obtained using reflection unfortunately).

Therefore, the test is probabilistic by its nature, and we cannot beabsolutely sure that at least one child thread will get non-nullreference returned from remove().
I'm withdrawing the second webrev: While moving startedSignal.await()after System.gc() reduces the number of cases when nonNullRefCount ==0, it also greatly decreases the chances of occurring the situationthe test was designed for.
I believe the correct fix here is to ignore the (quite rare) situationwhen nonNullRefCount upon the completion.
I suggest to return to the very first trivial fix:
http://cr.openjdk.java.net/~igerasim/8038333/1/webrev/


But this webrev *is moving startedSignal.await() after System.gc()* ...

Regards, Peter

Sincerely yours,
Ivan

On 27.03.2014 4:30, Mandy Chung wrote:
On 3/26/2014 3:57 PM, David Holmes wrote:
Ivan,
I think the problem is that the EarlyTimeout threads can completetheir remove(TIMEOUT) before the main thread has started them all,cleared the reference and called System.gc().
Depending on exactly what is being tested, the EarlyTimeout threadsmay need to wait on another latch that is signalled by the mainthread after the gc() call returns. Still no guarantee that the gcwill do its work before the timeout elapses.
This is similar to what I have been thinking. I believe theEarlyTimeout threads don't need the startedSignal.countDown; insteadit can have a signal latch with 1 count. The EarlyTimeout threadsawaits on the signal latch. The main thread will callsignal.countDown after System.gc() and we can add a checkweakReference.isEnqueued to make sure before we awake the threads toproceed the queue.remove call.
Ivan - for the error message, perhaps you can simply do this:

      if (nonNullRefCount != 1) {
throw new RuntimeException(nonNullRefCount + " referenceswere removed from queue");
      }


Mandy
David

On 27/03/2014 6:18 AM, Ivan Gerasimov wrote:
Thank you Mandy!
Are you able to reproduce the test failure?
Yes, I could easily reproduce the failure when I reduced thetimeout to
10 ms.
With the timeout reduced, the test fails every third time on mymachine.
I think the test verifies that only one thread gets the referenceis a
good test.
But if none of the threads could get the reference, it should notcause
the failure of the test.
It only means that during this particular run we could not have tested
what we needed.
We could retry, but I'm not sure it's worth complicating the test.
It's easier to ignore the failure, taking into account that it happens
rarely.
I think the race is due to the threads get to call queue.remove as
soon as both threads decrement the count of the latch that can bewell
before the reference is enqueued.
The problem is that we have no way to block the main thread untilthere
is a reference in the queue.
I improved the situation a bit, having moved the await() after thecall
of gc().
It'd be good to add additional information in the test to help
diagnosing test failure.
I added reporting to stderr about being unable to remove a reference
from the queue.
I believe we shouldn't treat it as an error, and should simplyignore it.
Would you please have a look at the updated webrev:
http://cr.openjdk.java.net/~igerasim/8038333/1/webrev/

Sincerely yours,
Ivan

Re: RFR [8038333] java/lang/ref/EarlyTimeout.java failed

Reply via email to