Tristan,

Can you please file a new bug and bring in the changes that Martin pushed to 
the 166 CVS.
  
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java?view=co

Create a changeset, and I can do the OpenJDK review.

-Chris.

On 24 Feb 2014, at 03:03, Tristan Yan <tristan....@oracle.com> wrote:

> I am ok with this unfix for now. Martin. And thank you for bringing the 
> improvement to jsr166 CVS.
> 
> Regards
> 
> Tristan
> 
>  
> 
> 发件人: Martin Buchholz [mailto:marti...@google.com] 
> 发送时间: Monday, February 24, 2014 10:59 AM
> 收件人: Tristan Yan
> 抄送: core-libs-dev
> 主题: Re: 答复: 答复: RFR for JDK-8031374: 
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails 
> Intermittently
> 
>  
> 
> Hi Tristan,
> 
>  
> 
> I don't think we should change the test without a sufficiently solid reason, 
> which usually means reproducing the failure, even if e.g. only once in 1000 
> times.  Here we don't know which queue implementation is causing the failure, 
> and we don't know what chunk size was being used, which might cause the 
> failure to be reproduced more reliably.  A single test failure might be due 
> to cosmic rays!  Let's leave unfixed this while it cannot be reproduced.  You 
> can check in my general purpose improvements to the test from jsr166 CVS.
> 
>  
> 
> On Sun, Feb 23, 2014 at 6:41 PM, Tristan Yan <HYPERLINK 
> "mailto:tristan....@oracle.com"tristan....@oracle.com> wrote:
> 
> Hi Martin
> 
> This test failed once in our nightly test. So this is a real case failure. 
> But unfortunately we couldn’t know more except the log we had.
> 
> 10 extra seconds may need serve 1024 loop totally, also in windows 
> Thread.yield() doesn’t give up CPU most of times, then we can may have the 
> situation that remover is trying to remove object from a empty queue but it 
> doesn’t find anything a couple of times(it doesn’t give up cpu time) then 
> offer thread get cpu time. And offer thread insert a couple of elements and 
> queue is full.  Offer thread tries to give up CPU but Thread.yield() doesn’t 
> really give up. Let’s assume it takes 1024 loop again. And offer thread got 
> timeout.  Then remover thread try to remove elements, and it takes another 
> 1024 loop to get to timeout as well. So 10 seconds may need distribute to 
> 2048 loop at most. Every loop has 5 ms foreach. That’s why I simulate this 
> case with a Thread.sleep(20). I admit I can’t reproduce it without changing 
> code, this is all theoretical analysis. But this is closest possible reason 
> that I can deduce. 
> 
> Thank you.
> 
> Tristan
> 
>  
> 
> 发件人: Martin Buchholz [mailto:HYPERLINK 
> "mailto:marti...@google.com"marti...@google.com] 
> 发送时间: Monday, February 24, 2014 10:16 AM
> 
> 
> 收件人: Tristan Yan
> 抄送: core-libs-dev
> 
> 主题: Re: 答复: RFR for JDK-8031374: 
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails 
> Intermittently
> 
>  
> 
> I may very well be missing something, but the actual extra timeout for 
> threads to finish is 10 whole seconds, which should be long enough to process 
> a single chunk, even on Windows.  
> 
>  
> 
> If you can repro this consistently, try to find out which queue 
> implementation is affected.  
> 
>  
> 
> We can also shrink max queue size and chunk size to reduce time to traverse 
> the queue elements.
> 
>  
> 
> On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <HYPERLINK 
> "mailto:tristan....@oracle.com"tristan....@oracle.com> wrote:
> 
> Thank you for reviewing this. Martin
> 
>  
> 
> The original bug report is here: 
> https://bugs.openjdk.java.net/browse/JDK-8031374.
> 
>  
> 
> I haven’t reproduced this bug but I can simulate to reproduce this failure by 
> changing  yield() in remover thread to Thread.sleep(20).
> 
> You have commented “You've completely broken the intent of this code, which 
> was to poll ***occasionally*** for quitting time”. I tried to not changed the 
> logic for the test. This failure comes when only 1st rounds(1024 loop) wasn’t 
> finished in given quitting time(before timeout). Which was 300ms. One 
> solution is increasing default timeout as you suggested. But question is how 
> big should we increase. When the test is really slow and could not finish 1st 
> round(1024 loop) before timeout, I couldn’t figure out what’s the reason 
> timeout value. Also this definitely will slow down the tests when it run in 
> fast machine. Which is the case most of time. So I took other step that skip 
> wait if test doesn't finish 1st round(1024 loop) before timeout. I won’t say 
> I completely broke the intent of the code here because it’s rare case.(Only 
> if the machine is slow and slow enough that the test doesn't finish 1st round 
> before timeout).
> 
>  
> 
> The reason that replace Thread.yield to Thread.sleep(0L) because David Holmes 
> point out in the bug “sleep will certainly give up the CPU, whereas yield is 
> not required to.” Also in other mail, David pointed I should use 
> Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit longer as we 
> don't know how the OS will behave if the sleep time requested is less than 
> the natural resolution of the sleep mechanism. For the experiment I’ve done 
> in Windows. Thread.yeild() or Thread.sleep(10L) won’t guarantee current 
> thread give up the CPU. This is a hint to OS. This makes in windows remover 
> and offer thread could wait to each other more more than other other 
> operating system. This is also the one of the reason that can explain why 
> we've seen this in windows only.
> 
>  
> 
> Regards
> 
> Tristan
> 
>  
> 
>  
> 
>  
> 
> 发件人: Martin Buchholz [mailto:HYPERLINK 
> "mailto:marti...@google.com"marti...@google.com] 
> 发送时间: Monday, February 24, 2014 3:47 AM
> 收件人: Tristan Yan
> 抄送: core-libs-dev
> 主题: Re: RFR for JDK-8031374: 
> java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails 
> Intermittently
> 
>  
> 
> Hi Tristan,
> 
>  
> 
> (thanks for working on my old crappy tests, and apologies for always giving 
> you a hard time)
> 
>  
> 
> I couldn't find the bug report.  Can you provide a URL?
> 
>  
> 
> Thread.stop is only called in case the test has already failed (presumed 
> "hung"), as a last resort.
> 
>  
> 
> Perhaps the timeout used in the test (300 ms) is too small on some systems?
> 
>  
> 
> +            protected void giveupCPU(){
> +                try {
> +                    Thread.sleep(0L);
> +                } catch (InterruptedException ignore) {}
>              }
> 
>  
> 
> I know of no reason why Thread.sleep(0) should be any more effective than 
> Thread.yield.  If it was more effective, then why wouldn't Thread.yield on 
> that platform be fixed to be implemented exactly the same way?  IOW if this 
> is a problem, fix the JDK!
> 
>  
> 
>              /** Polls occasionally for quitting time. */
>              protected boolean quittingTime(long i) {
> -                return (i % 1024) == 0 && quittingTime();
> +                return stopRequest || quittingTime() && (i % 1024 == 0 || i 
> < 1024);
> +            }
> 
>  
> 
> You've completely broken the intent of this code, which was to poll 
> ***occasionally*** for quitting time.
> 
>  
> 
>  
> 
> On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <HYPERLINK 
> "mailto:tristan....@oracle.com"tristan....@oracle.com> wrote:
> 
> Hi All
> Could you please help to review fix for JDK-803174.
> 
> http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/
> 
> Description:
> There are a couple of code change for this code.
> 1. Original code has used known problematic Thread.stop. Which could cause 
> ThreadDeath as all we know. I change it to a customized stopThread method.
> 2. Test failed in windows, I analyze the failure by changing Thread.yield() 
> in remover thread to Thread.sleep(50). This is a simulation for slow machine. 
> Which shows exact same failures as we can see in bug description. By adding 
> more debug info, we can see although we tried to give up CPU by using 
> Thread.yield().(I have tried to use Thread.sleep(1L) as well, the result is 
> same), there is no guarantee that os will pick up a new thread to execute. In 
> Windows, this is more obvious. Because the execution is slow. even when the 
> timeout happens, offer thread and remove thread won’t have chance to get the 
> point that i % 1024 == 0. This will cause the failure like we see in the log. 
> My fix here is when the timeout happens, but i is still less than 1024. Stop 
> offer thread and remover thread right away instead letting them continuously 
> wait the point to i == 1024.
> 3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of discussion 
> that Thread.yield is not required to give up CPU.
> 
> Thank you
> Tristan 
> 
>  
> 
>  
> 
>  

Reply via email to