On Mon, Jan 18, 2016 at 8:35 AM, David Holmes <david.hol...@oracle.com> wrote: > On 18/01/2016 3:08 AM, Andreas Lundblad wrote: >> >> On Fri, Jan 15, 2016 at 10:59:08AM +0100, Volker Simonis wrote: >>> >>> Maybe the timeout of five seconds is too small? >>> >>> Our AIX boxes are not the fastest and we also have a lot of stuff on NFS >>> shares. >>> >>> I recently saw this one: >>> >>> [CLIENT] Exception caught: java.io.IOException: No port file values >>> materialized. Giving up after 6676 ms >>> >>> Normally the timeout should be just about a little more than 5 seconds >>> but here the exception reports more than 6 seconds which might be a >>> hint that the machine was severely overloaded. >>> >>> Regards, >>> Volker >> >> >> Interesting observation. The code for waiting for valid port file values >> basically looks like >> >> for (int i = 0; i < 10; i++) { >> checkPortFile(); >> if (successful) >> break; >> sleep(500); >> } >> >> so the fact that it even reaches 6676 ms looks suspicious when it comes to >> load. > > > Why? Under load those sleep(500)'s might not return for much longer; and the > whole things might be time preempted at any point for an extended period of > time.
Yes, exactly. What about putting another loop around this loop which prints a warning to stdout (e.g. "..trying to connect to sjavac server since X seconds") for another five or so times. We could also print the system load [1] although I'm not sure it's worth it. On our AIX machines it is often a network/NFS problem which causes long startup times of new executables and this won't be observable by looking at the system load (but it may at least give a hint'). [1] http://docs.oracle.com/javase/6/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage%28%29 > > David > ----- > > >> -- Andreas >> >