On 3/26/2014 3:57 PM, David Holmes wrote:
Ivan,

I think the problem is that the EarlyTimeout threads can complete their remove(TIMEOUT) before the main thread has started them all, cleared the reference and called System.gc().

Depending on exactly what is being tested, the EarlyTimeout threads may need to wait on another latch that is signalled by the main thread after the gc() call returns. Still no guarantee that the gc will do its work before the timeout elapses.


This is similar to what I have been thinking. I believe the EarlyTimeout threads don't need the startedSignal.countDown; instead it can have a signal latch with 1 count. The EarlyTimeout threads awaits on the signal latch. The main thread will call signal.countDown after System.gc() and we can add a check weakReference.isEnqueued to make sure before we awake the threads to proceed the queue.remove call.

Ivan - for the error message, perhaps you can simply do this:

      if (nonNullRefCount != 1) {
           throw new RuntimeException(nonNullRefCount + " references were removed 
from queue");
      }


Mandy

David

On 27/03/2014 6:18 AM, Ivan Gerasimov wrote:
Thank you Mandy!

Are you able to reproduce the test failure?

Yes, I could easily reproduce the failure when I reduced the timeout to
10 ms.
With the timeout reduced, the test fails every third time on my machine.

I think the test verifies that only one thread gets the reference is a
good test.

But if none of the threads could get the reference, it should not cause
the failure of the test.
It only means that during this particular run we could not have tested
what we needed.
We could retry, but I'm not sure it's worth complicating the test.
It's easier to ignore the failure, taking into account that it happens
rarely.

I think the race is due to the threads get to call queue.remove as
soon as both threads decrement the count of the latch that can be well
before the reference is enqueued.

The problem is that we have no way to block the main thread until there
is a reference in the queue.
I improved the situation a bit, having moved the await() after the call
of gc().

It'd be good to add additional information in the test to help
diagnosing test failure.

I added reporting to stderr about being unable to remove a reference
from the queue.
I believe we shouldn't treat it as an error, and should simply ignore it.

Would you please have a look at the updated webrev:
http://cr.openjdk.java.net/~igerasim/8038333/1/webrev/

Sincerely yours,
Ivan



Reply via email to