I created a Jira issue with a proposed patch and the test case included below. https://issues.apache.org/jira/browse/RIVER-397 Chris
-----Original Message----- From: Christopher Dolan [mailto:[email protected]] Sent: Wednesday, May 04, 2011 1:05 PM To: [email protected] Subject: RE: client hang in com.sun.jini.jeri.internal.mux.Mux.start() Here's a test that consistently fails with the current Mux implementation and passes with the patch I proposed at the beginning of this thread. In my test I explicitly pretend that the server side of the connect has blocked. In reality, all we need to agree on is that it's possible for the server side to block. The proposed patch needs a little more work to make the timeout be configurable. If so, the test can be sped up by setting that timeout to something unrealistically short. public class MuxStartTimeout { @Test public void test() throws IOException, InterruptedException { // make fake input and output streams. OutputStream os = new ByteArrayOutputStream(); InputStream is = new InputStream() { @Override public synchronized int read() throws IOException { try { // block indefinitely while (true) wait(); } catch (InterruptedException e) { return 0; } } }; final AtomicBoolean finished = new AtomicBoolean(false); final AtomicBoolean succeeded = new AtomicBoolean(false); final AtomicBoolean failed = new AtomicBoolean(false); final MuxClient muxClient = new MuxClient(os, is); try { Thread t = new Thread(new Runnable() { public void run() { try { muxClient.start(); succeeded.set(true); } catch (IOException e) { failed.set(true); } finished.set(true); } }); t.start(); t.join(20000); Assert.assertTrue(finished.get()); Assert.assertFalse(succeeded.get()); Assert.assertTrue(failed.get()); if (!t.isInterrupted()) t.interrupt(); } finally { muxClient.shutdown("end of test"); } } } Chris P.S. Amusingly, I actually compiled the test against org.testng.annotations.Test org.testng.Assert but it should also work as written against org.junit.Test and org.junit.Assert -----Original Message----- From: Patricia Shanahan [mailto:[email protected]] Sent: Wednesday, May 04, 2011 11:24 AM To: [email protected] Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start() This raises a more general question that has been troubling me: What should we do about theoretical deadlocks and similar concurrency issues that have not been demonstrated in practice? On the one hand, I like to have a test to show that a change really fixed something. On the other hand, a concurrency problem can contribute to general flakiness without ever reaching the point of being reported as a bug or having a test that demonstrates it. Patricia On 5/4/2011 8:47 AM, Christopher Dolan wrote: ... > I haven't conclusively witnessed that specific deadlock, but I've had a > closely related problem where another process coincidentally grabs port > 4160 before Reggie gets it. This happens because Win2k, WinXP and Win2k3 > use 1024-5000 for their dynamic port range, contrary to IANA > recommendations. I suspect the deadlock described above happens in real > life, but I've never gotten detailed enough logs to prove it, just > client stack traces showing the hang in Mux.start(). ...
