Hi Martin

This test failed once in our nightly test. So this is a real case failure. But 
unfortunately we couldn’t know more except the log we had.

10 extra seconds may need serve 1024 loop totally, also in windows 
Thread.yield() doesn’t give up CPU most of times, then we can may have the 
situation that remover is trying to remove object from a empty queue but it 
doesn’t find anything a couple of times(it doesn’t give up cpu time) then offer 
thread get cpu time. And offer thread insert a couple of elements and queue is 
full.  Offer thread tries to give up CPU but Thread.yield() doesn’t really give 
up. Let’s assume it takes 1024 loop again. And offer thread got timeout.  Then 
remover thread try to remove elements, and it takes another 1024 loop to get to 
timeout as well. So 10 seconds may need distribute to 2048 loop at most. Every 
loop has 5 ms foreach. That’s why I simulate this case with a Thread.sleep(20). 
I admit I can’t reproduce it without changing code, this is all theoretical 
analysis. But this is closest possible reason that I can deduce. 

Thank you.

Tristan

 

发件人: Martin Buchholz [mailto:marti...@google.com] 
发送时间: Monday, February 24, 2014 10:16 AM
收件人: Tristan Yan
抄送: core-libs-dev
主题: Re: 答复: RFR for JDK-8031374: 
java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently

 

I may very well be missing something, but the actual extra timeout for threads 
to finish is 10 whole seconds, which should be long enough to process a single 
chunk, even on Windows.  

 

If you can repro this consistently, try to find out which queue implementation 
is affected.  

 

We can also shrink max queue size and chunk size to reduce time to traverse the 
queue elements.

 

On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <HYPERLINK 
"mailto:tristan....@oracle.com"tristan....@oracle.com> wrote:

Thank you for reviewing this. Martin

 

The original bug report is here: 
https://bugs.openjdk.java.net/browse/JDK-8031374.

 

I haven’t reproduced this bug but I can simulate to reproduce this failure by 
changing  yield() in remover thread to Thread.sleep(20).

You have commented “You've completely broken the intent of this code, which was 
to poll ***occasionally*** for quitting time”. I tried to not changed the logic 
for the test. This failure comes when only 1st rounds(1024 loop) wasn’t 
finished in given quitting time(before timeout). Which was 300ms. One solution 
is increasing default timeout as you suggested. But question is how big should 
we increase. When the test is really slow and could not finish 1st round(1024 
loop) before timeout, I couldn’t figure out what’s the reason timeout value. 
Also this definitely will slow down the tests when it run in fast machine. 
Which is the case most of time. So I took other step that skip wait if test 
doesn't finish 1st round(1024 loop) before timeout. I won’t say I completely 
broke the intent of the code here because it’s rare case.(Only if the machine 
is slow and slow enough that the test doesn't finish 1st round before timeout).

 

The reason that replace Thread.yield to Thread.sleep(0L) because David Holmes 
point out in the bug “sleep will certainly give up the CPU, whereas yield is 
not required to.” Also in other mail, David pointed I should use 
Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit longer as we 
don't know how the OS will behave if the sleep time requested is less than the 
natural resolution of the sleep mechanism. For the experiment I’ve done in 
Windows. Thread.yeild() or Thread.sleep(10L) won’t guarantee current thread 
give up the CPU. This is a hint to OS. This makes in windows remover and offer 
thread could wait to each other more more than other other operating system. 
This is also the one of the reason that can explain why we've seen this in 
windows only.

 

Regards

Tristan

 

 

 

发件人: Martin Buchholz [mailto:HYPERLINK 
"mailto:marti...@google.com"marti...@google.com] 
发送时间: Monday, February 24, 2014 3:47 AM
收件人: Tristan Yan
抄送: core-libs-dev
主题: Re: RFR for JDK-8031374: 
java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails Intermittently

 

Hi Tristan,

 

(thanks for working on my old crappy tests, and apologies for always giving you 
a hard time)

 

I couldn't find the bug report.  Can you provide a URL?

 

Thread.stop is only called in case the test has already failed (presumed 
"hung"), as a last resort.

 

Perhaps the timeout used in the test (300 ms) is too small on some systems?

 

+            protected void giveupCPU(){
+                try {
+                    Thread.sleep(0L);
+                } catch (InterruptedException ignore) {}
             }

 

I know of no reason why Thread.sleep(0) should be any more effective than 
Thread.yield.  If it was more effective, then why wouldn't Thread.yield on that 
platform be fixed to be implemented exactly the same way?  IOW if this is a 
problem, fix the JDK!

 

             /** Polls occasionally for quitting time. */
             protected boolean quittingTime(long i) {
-                return (i % 1024) == 0 && quittingTime();
+                return stopRequest || quittingTime() && (i % 1024 == 0 || i < 
1024);
+            }

 

You've completely broken the intent of this code, which was to poll 
***occasionally*** for quitting time.

 

 

On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <HYPERLINK 
"mailto:tristan....@oracle.com"tristan....@oracle.com> wrote:

Hi All
Could you please help to review fix for JDK-803174.

http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/

Description:
There are a couple of code change for this code.
1. Original code has used known problematic Thread.stop. Which could cause 
ThreadDeath as all we know. I change it to a customized stopThread method.
2. Test failed in windows, I analyze the failure by changing Thread.yield() in 
remover thread to Thread.sleep(50). This is a simulation for slow machine. 
Which shows exact same failures as we can see in bug description. By adding 
more debug info, we can see although we tried to give up CPU by using 
Thread.yield().(I have tried to use Thread.sleep(1L) as well, the result is 
same), there is no guarantee that os will pick up a new thread to execute. In 
Windows, this is more obvious. Because the execution is slow. even when the 
timeout happens, offer thread and remove thread won’t have chance to get the 
point that i % 1024 == 0. This will cause the failure like we see in the log. 
My fix here is when the timeout happens, but i is still less than 1024. Stop 
offer thread and remover thread right away instead letting them continuously 
wait the point to i == 1024.
3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of discussion 
that Thread.yield is not required to give up CPU.

Thank you
Tristan 

 

 

Reply via email to