Tristan, Can you please file a new bug and bring in the changes that Martin pushed to the 166 CVS. http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java?view=co
Create a changeset, and I can do the OpenJDK review. -Chris. On 24 Feb 2014, at 03:03, Tristan Yan <tristan....@oracle.com> wrote: > I am ok with this unfix for now. Martin. And thank you for bringing the > improvement to jsr166 CVS. > > Regards > > Tristan > > > > 发件人: Martin Buchholz [mailto:marti...@google.com] > 发送时间: Monday, February 24, 2014 10:59 AM > 收件人: Tristan Yan > 抄送: core-libs-dev > 主题: Re: 答复: 答复: RFR for JDK-8031374: > java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails > Intermittently > > > > Hi Tristan, > > > > I don't think we should change the test without a sufficiently solid reason, > which usually means reproducing the failure, even if e.g. only once in 1000 > times. Here we don't know which queue implementation is causing the failure, > and we don't know what chunk size was being used, which might cause the > failure to be reproduced more reliably. A single test failure might be due > to cosmic rays! Let's leave unfixed this while it cannot be reproduced. You > can check in my general purpose improvements to the test from jsr166 CVS. > > > > On Sun, Feb 23, 2014 at 6:41 PM, Tristan Yan <HYPERLINK > "mailto:tristan....@oracle.com"tristan....@oracle.com> wrote: > > Hi Martin > > This test failed once in our nightly test. So this is a real case failure. > But unfortunately we couldn’t know more except the log we had. > > 10 extra seconds may need serve 1024 loop totally, also in windows > Thread.yield() doesn’t give up CPU most of times, then we can may have the > situation that remover is trying to remove object from a empty queue but it > doesn’t find anything a couple of times(it doesn’t give up cpu time) then > offer thread get cpu time. And offer thread insert a couple of elements and > queue is full. Offer thread tries to give up CPU but Thread.yield() doesn’t > really give up. Let’s assume it takes 1024 loop again. And offer thread got > timeout. Then remover thread try to remove elements, and it takes another > 1024 loop to get to timeout as well. So 10 seconds may need distribute to > 2048 loop at most. Every loop has 5 ms foreach. That’s why I simulate this > case with a Thread.sleep(20). I admit I can’t reproduce it without changing > code, this is all theoretical analysis. But this is closest possible reason > that I can deduce. > > Thank you. > > Tristan > > > > 发件人: Martin Buchholz [mailto:HYPERLINK > "mailto:marti...@google.com"marti...@google.com] > 发送时间: Monday, February 24, 2014 10:16 AM > > > 收件人: Tristan Yan > 抄送: core-libs-dev > > 主题: Re: 答复: RFR for JDK-8031374: > java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails > Intermittently > > > > I may very well be missing something, but the actual extra timeout for > threads to finish is 10 whole seconds, which should be long enough to process > a single chunk, even on Windows. > > > > If you can repro this consistently, try to find out which queue > implementation is affected. > > > > We can also shrink max queue size and chunk size to reduce time to traverse > the queue elements. > > > > On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <HYPERLINK > "mailto:tristan....@oracle.com"tristan....@oracle.com> wrote: > > Thank you for reviewing this. Martin > > > > The original bug report is here: > https://bugs.openjdk.java.net/browse/JDK-8031374. > > > > I haven’t reproduced this bug but I can simulate to reproduce this failure by > changing yield() in remover thread to Thread.sleep(20). > > You have commented “You've completely broken the intent of this code, which > was to poll ***occasionally*** for quitting time”. I tried to not changed the > logic for the test. This failure comes when only 1st rounds(1024 loop) wasn’t > finished in given quitting time(before timeout). Which was 300ms. One > solution is increasing default timeout as you suggested. But question is how > big should we increase. When the test is really slow and could not finish 1st > round(1024 loop) before timeout, I couldn’t figure out what’s the reason > timeout value. Also this definitely will slow down the tests when it run in > fast machine. Which is the case most of time. So I took other step that skip > wait if test doesn't finish 1st round(1024 loop) before timeout. I won’t say > I completely broke the intent of the code here because it’s rare case.(Only > if the machine is slow and slow enough that the test doesn't finish 1st round > before timeout). > > > > The reason that replace Thread.yield to Thread.sleep(0L) because David Holmes > point out in the bug “sleep will certainly give up the CPU, whereas yield is > not required to.” Also in other mail, David pointed I should use > Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit longer as we > don't know how the OS will behave if the sleep time requested is less than > the natural resolution of the sleep mechanism. For the experiment I’ve done > in Windows. Thread.yeild() or Thread.sleep(10L) won’t guarantee current > thread give up the CPU. This is a hint to OS. This makes in windows remover > and offer thread could wait to each other more more than other other > operating system. This is also the one of the reason that can explain why > we've seen this in windows only. > > > > Regards > > Tristan > > > > > > > > 发件人: Martin Buchholz [mailto:HYPERLINK > "mailto:marti...@google.com"marti...@google.com] > 发送时间: Monday, February 24, 2014 3:47 AM > 收件人: Tristan Yan > 抄送: core-libs-dev > 主题: Re: RFR for JDK-8031374: > java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails > Intermittently > > > > Hi Tristan, > > > > (thanks for working on my old crappy tests, and apologies for always giving > you a hard time) > > > > I couldn't find the bug report. Can you provide a URL? > > > > Thread.stop is only called in case the test has already failed (presumed > "hung"), as a last resort. > > > > Perhaps the timeout used in the test (300 ms) is too small on some systems? > > > > + protected void giveupCPU(){ > + try { > + Thread.sleep(0L); > + } catch (InterruptedException ignore) {} > } > > > > I know of no reason why Thread.sleep(0) should be any more effective than > Thread.yield. If it was more effective, then why wouldn't Thread.yield on > that platform be fixed to be implemented exactly the same way? IOW if this > is a problem, fix the JDK! > > > > /** Polls occasionally for quitting time. */ > protected boolean quittingTime(long i) { > - return (i % 1024) == 0 && quittingTime(); > + return stopRequest || quittingTime() && (i % 1024 == 0 || i > < 1024); > + } > > > > You've completely broken the intent of this code, which was to poll > ***occasionally*** for quitting time. > > > > > > On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <HYPERLINK > "mailto:tristan....@oracle.com"tristan....@oracle.com> wrote: > > Hi All > Could you please help to review fix for JDK-803174. > > http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/ > > Description: > There are a couple of code change for this code. > 1. Original code has used known problematic Thread.stop. Which could cause > ThreadDeath as all we know. I change it to a customized stopThread method. > 2. Test failed in windows, I analyze the failure by changing Thread.yield() > in remover thread to Thread.sleep(50). This is a simulation for slow machine. > Which shows exact same failures as we can see in bug description. By adding > more debug info, we can see although we tried to give up CPU by using > Thread.yield().(I have tried to use Thread.sleep(1L) as well, the result is > same), there is no guarantee that os will pick up a new thread to execute. In > Windows, this is more obvious. Because the execution is slow. even when the > timeout happens, offer thread and remove thread won’t have chance to get the > point that i % 1024 == 0. This will cause the failure like we see in the log. > My fix here is when the timeout happens, but i is still less than 1024. Stop > offer thread and remover thread right away instead letting them continuously > wait the point to i == 1024. > 3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of discussion > that Thread.yield is not required to give up CPU. > > Thank you > Tristan > > > > > >