> >> Interesting observation. The code for waiting for valid port file values > >> basically looks like > >> > >> for (int i = 0; i < 10; i++) { > >> checkPortFile(); > >> if (successful) > >> break; > >> sleep(500); > >> } > >> > >> so the fact that it even reaches 6676 ms looks suspicious when it comes to > >> load. > > > > > > Why? Under load those sleep(500)'s might not return for much longer; and the > > whole things might be time preempted at any point for an extended period of > > time.
What I meant was that the fact that the code takes 6676 ms to complete increases my suspicion about it being due to a load issue. > What about putting another loop around this loop which prints a > warning to stdout (e.g. "..trying to connect to sjavac server since X > seconds") for another five or so times. We could also print the system > load [1] although I'm not sure it's worth it. On our AIX machines it > is often a network/NFS problem which causes long startup times of new > executables and this won't be observable by looking at the system load > (but it may at least give a hint'). > > [1] > http://docs.oracle.com/javase/6/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage%28%29 I don't think a loop around the loop is necessary. The actual code (which is slightly different from the snippet I posted above) already prints a message between each attempt. I was thinking of just bumping the timeout from 5 seconds to, say, 60 seconds. If it's a load issue, we should se something like "Port file values found after 9000 ms", in which case we know for sure that it was a premature timeout issue. If no port files materialize after >60 seconds, we can probably safely assume that the issue is due to something else. -- Andreas