I'm doing some independent cleanup and checking into jsr166 CVS: This gets rid of the Thread.stop and introduces a Semaphore that can be waited on instead of spinning. Try to work from this new version. If you can never reproduce the failure to verify a fix, I suggest leaving the bug unfixed until we can. Presumably you have a Windows machine in a lab somewhere that produced this originally?
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java?view=co Index: OfferRemoveLoops.java =================================================================== RCS file: /export/home/jsr166/jsr166/jsr166/src/test/jtreg/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java,v retrieving revision 1.7 diff -u -r1.7 OfferRemoveLoops.java --- OfferRemoveLoops.java 18 Jan 2013 04:23:28 -0000 1.7 +++ OfferRemoveLoops.java 24 Feb 2014 01:57:03 -0000 @@ -32,6 +32,7 @@ import java.util.*; import java.util.concurrent.*; import java.util.concurrent.atomic.*; +import static java.util.concurrent.TimeUnit.*; @SuppressWarnings({"unchecked", "rawtypes", "deprecation"}) public class OfferRemoveLoops { @@ -71,9 +72,10 @@ final long timeoutMillis = 10L * 1000L; final int maxChunkSize = 1042; final int maxQueueSize = 10 * maxChunkSize; + final CountDownLatch done = new CountDownLatch(3); - /** Poor man's bounded buffer. */ - final AtomicLong approximateCount = new AtomicLong(0L); + /** Poor man's bounded buffer; prevents unbounded queue expansion. */ + final Semaphore offers = new Semaphore(maxQueueSize); abstract class CheckedThread extends Thread { CheckedThread(String name) { @@ -89,42 +91,44 @@ protected boolean quittingTime(long i) { return (i % 1024) == 0 && quittingTime(); } - protected abstract void realRun(); + protected abstract void realRun() throws Exception; public void run() { try { realRun(); } catch (Throwable t) { unexpected(t); } } } Thread offerer = new CheckedThread("offerer") { - protected void realRun() { - final long chunkSize = getRandom().nextInt(maxChunkSize) + 2; + protected void realRun() throws InterruptedException { + final int chunkSize = getRandom().nextInt(maxChunkSize) + 20; long c = 0; - for (long i = 0; ! quittingTime(i); i++) { + while (! quittingTime()) { if (q.offer(Long.valueOf(c))) { if ((++c % chunkSize) == 0) { - approximateCount.getAndAdd(chunkSize); - while (approximateCount.get() > maxQueueSize) - Thread.yield(); + offers.acquire(chunkSize); } } else { Thread.yield(); - }}}}; + } + } + done.countDown(); + }}; Thread remover = new CheckedThread("remover") { protected void realRun() { - final long chunkSize = getRandom().nextInt(maxChunkSize) + 2; + final int chunkSize = getRandom().nextInt(maxChunkSize) + 20; long c = 0; - for (long i = 0; ! quittingTime(i); i++) { + while (! quittingTime()) { if (q.remove(Long.valueOf(c))) { if ((++c % chunkSize) == 0) { - approximateCount.getAndAdd(-chunkSize); + offers.release(chunkSize); } } else { Thread.yield(); } } q.clear(); - approximateCount.set(0); // Releases waiting offerer thread + offers.release(1<<30); // Releases waiting offerer thread + done.countDown(); }}; Thread scanner = new CheckedThread("scanner") { @@ -139,18 +143,19 @@ break; } Thread.yield(); - }}}; + } + done.countDown(); + }}; - for (Thread thread : new Thread[] { offerer, remover, scanner }) { - thread.join(timeoutMillis + testDurationMillis); - if (thread.isAlive()) { - System.err.printf("Hung thread: %s%n", thread.getName()); - failed++; - for (StackTraceElement e : thread.getStackTrace()) - System.err.println(e); - // Kludge alert - thread.stop(); - thread.join(timeoutMillis); + if (! done.await(timeoutMillis + testDurationMillis, MILLISECONDS)) { + for (Thread thread : new Thread[] { offerer, remover, scanner }) { + if (thread.isAlive()) { + System.err.printf("Hung thread: %s%n", thread.getName()); + failed++; + for (StackTraceElement e : thread.getStackTrace()) + System.err.println(e); + thread.interrupt(); + } } } } On Sun, Feb 23, 2014 at 4:23 PM, Tristan Yan <tristan....@oracle.com> wrote: > Thank you for reviewing this. Martin > > > > The original bug report is here: > https://bugs.openjdk.java.net/browse/JDK-8031374. > > > > I haven’t reproduced this bug but I can simulate to reproduce this failure > by changing yield() in remover thread to Thread.sleep(20). > > You have commented “You've completely broken the intent of this code, > which was to poll ***occasionally*** for quitting time”. I tried to not > changed the logic for the test. This failure comes when only 1strounds(1024 > loop) wasn’t finished in given quitting time(before timeout). > Which was 300ms. One solution is increasing default timeout as you > suggested. But question is how big should we increase. When the test is > really slow and could not finish 1st round(1024 loop) before timeout, I > couldn’t figure out what’s the reason timeout value. Also this definitely > will slow down the tests when it run in fast machine. Which is the case > most of time. So I took other step that skip wait if test doesn't finish 1 > st round(1024 loop) before timeout. I won’t say I completely broke the > intent of the code here because it’s rare case.(Only if the machine is slow > and slow enough that the test doesn't finish 1st round before timeout). > > > > The reason that replace Thread.yield to Thread.sleep(0L) because David > Holmes point out in the bug “sleep will certainly give up the CPU, > whereas yield is not required to.” Also in other mail, David pointed I > should use Thread.sleep(10L) instead of Thread.sleep(0L) preferably a bit > longer as we don't know how the OS will behave if the sleep time requested > is less than the natural resolution of the sleep mechanism. For the > experiment I’ve done in Windows. Thread.yeild() or Thread.sleep(10L) won’t > guarantee current thread give up the CPU. This is a hint to OS. This makes > in windows remover and offer thread could wait to each other more more than > other other operating system. This is also the one of the reason that can > explain why we've seen this in windows only. > > > > Regards > > Tristan > > > > > > > > *发件人**:* Martin Buchholz [mailto:marti...@google.com] > *发送时间:* Monday, February 24, 2014 3:47 AM > *收件人:* Tristan Yan > *抄送:* core-libs-dev > *主题:* Re: RFR for JDK-8031374: > java/util/concurrent/ConcurrentQueues/OfferRemoveLoops.java fails > Intermittently > > > > Hi Tristan, > > > > (thanks for working on my old crappy tests, and apologies for always > giving you a hard time) > > > > I couldn't find the bug report. Can you provide a URL? > > > > Thread.stop is only called in case the test has already failed (presumed > "hung"), as a last resort. > > > > Perhaps the timeout used in the test (300 ms) is too small on some systems? > > > > + protected void giveupCPU(){ > > + try { > > + Thread.sleep(0L); > > + } catch (InterruptedException ignore) {} > > } > > > > I know of no reason why Thread.sleep(0) should be any more effective than > Thread.yield. If it was more effective, then why wouldn't Thread.yield on > that platform be fixed to be implemented exactly the same way? IOW if this > is a problem, fix the JDK! > > > > /** Polls occasionally for quitting time. */ > > protected boolean quittingTime(long i) { > > - return (i % 1024) == 0 && quittingTime(); > > + return stopRequest || quittingTime() && (i % 1024 == 0 || i > < 1024); > > + } > > > > You've completely broken the intent of this code, which was to poll > ***occasionally*** for quitting time. > > > > > > On Sun, Feb 23, 2014 at 1:40 AM, Tristan Yan <tristan....@oracle.com> > wrote: > > Hi All > Could you please help to review fix for JDK-803174. > > http://cr.openjdk.java.net/~tyan/JDK-8031374/webrev.00/ > > Description: > There are a couple of code change for this code. > 1. Original code has used known problematic Thread.stop. Which could cause > ThreadDeath as all we know. I change it to a customized stopThread method. > 2. Test failed in windows, I analyze the failure by changing > Thread.yield() in remover thread to Thread.sleep(50). This is a simulation > for slow machine. Which shows exact same failures as we can see in bug > description. By adding more debug info, we can see although we tried to > give up CPU by using Thread.yield().(I have tried to use Thread.sleep(1L) > as well, the result is same), there is no guarantee that os will pick up a > new thread to execute. In Windows, this is more obvious. Because the > execution is slow. even when the timeout happens, offer thread and remove > thread won’t have chance to get the point that i % 1024 == 0. This will > cause the failure like we see in the log. My fix here is when the timeout > happens, but i is still less than 1024. Stop offer thread and remover > thread right away instead letting them continuously wait the point to i == > 1024. > 3. I replace Thread.yield to Thread.sleep(0L). I saw a couple of > discussion that Thread.yield is not required to give up CPU. > > Thank you > Tristan > > >