Hi Roger,
On Wed, Nov 12, 2014 at 1:57 PM, roger riggs <roger.ri...@oracle.com> wrote: > Hi Martin, > > I updated the webrev to use long timeouts for the test in question; they > can't be too long > because the spawned sleep is only alive for 10 seconds. I am looking at the "sleep" command invocation, but it's an indefinite sleep. I don't see 10 seconds anywhere. > Take a look please: > > http://cr.openjdk.java.net/~rriggs/webrev-basic-debug-8043477/ 2313 Integer exitValue = null; 2314 try { 2315 exitValue = p.exitValue(); 2316 } catch (IllegalStateException ise) { 2317 // no exitValue 2318 } 2319 if ((end - start) > 6 * 1_000_000_000) 2320 fail("Test failed: waitFor took too long on a dead process. (" + (end - start) + "ns)" 2321 + ", exitValue: " + Objects.requireNonNull(exitValue)); exitValue throws IllegalThreadStateException, not IllegalStateException. I would fail separately in case of ITSE. It looks instead like you'll fail with uninformative NPE. 2301 p.waitFor(1000, TimeUnit.MILLISECONDS); 2304 if ((end - start) < 500_000_000) I am surprised the test is not testing that the initial timed waitFor doesn't wait at least the specified time, instead of only half. I consider returning from waitFor early a bug. > > Thanks, Roger > > > On 11/12/2014 4:45 PM, Martin Buchholz wrote: >> >> Looking over the test again (I was the original author IIRC), >> I think this test still has lots of value with a longer timeout. >> >> Experience seems to show that 200ms is not a long enough timeout for >> "system" operations to use in non-flaky tests, including everything in >> Process handling. >> I say we ncrease the timeout by at least 10x and move on. >> >> On Wed, Nov 12, 2014 at 12:16 PM, roger riggs <roger.ri...@oracle.com> >> wrote: >>> >>> Hi Martin, Rob, >>> >>> I thought the point of the test was to verify the timeout logic in >>> waitFor(timeout). >>> Adding the prints without changing the test criteria was intended to >>> gather >>> data >>> about the distribution. >>> >>> A long timeout would still catch a failure in the reaper/waitFor >>> communication >>> so maybe that's ok. >>> >>> As for printing the time, I would switch to the newer >>> java.time.Instant/Duration >>> which have a decent toString. >>> >>> Roger >>> >>> >>> >>> On 11/12/2014 3:02 PM, Rob McKenna wrote: >>>> >>>> Eesh, Sorry Roger, I have something like this on my todo. >>>> >>>> Martin, my concern with the long delay approach is that it effectively >>>> nullifies the point of the test. Given that this test has been flakey >>>> the >>>> approach has been to simply bump the acceptable delta by another 100ms >>>> or so >>>> every time. Since we've had to do this repeatedly however, I'm beginning >>>> to >>>> question whether the test is more trouble than its worth. >>>> >>>> -Rob >>>> >>>> On 12/11/14 19:44, Martin Buchholz wrote: >>>>> >>>>> The print statement below seems redundant with the assertion failure >>>>> message. >>>>> You could improve the assertion message instead if need be. >>>>> Adding thousand separator underscores to 200000000L would help a >>>>> little. >>>>> >>>>> I like my little helper >>>>> >>>>> static long millisElapsedSince(long startNanoTime) { >>>>> return NANOSECONDS.toMillis(System.nanoTime() - >>>>> startNanoTime); >>>>> } >>>>> >>>>> >>>>> + System.out.printf(" waitFor process: delta: %d%n",(end - >>>>> start) ); >>>>> + >>>>> if ((end - start) > 200000000L * (AIX.is() ? 2 : 1)) >>>>> fail("Test failed: waitFor took too long (" + (end - >>>>> start) + "ns)"); >>>>> >>>>> 200 ms timeout for subprocesses to finish is just too damn low. In >>>>> j.u.c. tests we switched to 10 seconds for most "long" timeouts >>>>> (LONG_DELAY_MS) and are happy with the disappearance of rare flaky >>>>> results. >>>>> >>>>> >>>>> On Wed, Nov 12, 2014 at 11:34 AM, roger riggs <roger.ri...@oracle.com> >>>>> wrote: >>>>>> >>>>>> Please review test changes to ProcessBuilder Basic.java to add some >>>>>> debugging information. >>>>>> The test has been failing intermittently. The wait times have been >>>>>> extended >>>>>> to see >>>>>> if the systems are just slow. The failure criteria have not changed. >>>>>> >>>>>> Suggestions welcome. >>>>>> >>>>>> Webrev: >>>>>> http://cr.openjdk.java.net/~rriggs/webrev-basic-debug-8043477/ >>>>>> >>>>>> issue: >>>>>> 8043477: java/lang/ProcessBuilder/Basic.java failed with: >>>>>> java.lang.AssertionError: Some tests failed >>>>>> >>>>>> Thanks, Roger >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8043477 >>>>>> >