On Mon, Jan 18, 2016 at 3:34 PM, Andreas Lundblad <andreas.lundb...@oracle.com> wrote: >> >> Interesting observation. The code for waiting for valid port file values >> >> basically looks like >> >> >> >> for (int i = 0; i < 10; i++) { >> >> checkPortFile(); >> >> if (successful) >> >> break; >> >> sleep(500); >> >> } >> >> >> >> so the fact that it even reaches 6676 ms looks suspicious when it comes to >> >> load. >> > >> > >> > Why? Under load those sleep(500)'s might not return for much longer; and >> > the >> > whole things might be time preempted at any point for an extended period of >> > time. > > What I meant was that the fact that the code takes 6676 ms to complete > increases my suspicion about it being due to a load issue. > > >> What about putting another loop around this loop which prints a >> warning to stdout (e.g. "..trying to connect to sjavac server since X >> seconds") for another five or so times. We could also print the system >> load [1] although I'm not sure it's worth it. On our AIX machines it >> is often a network/NFS problem which causes long startup times of new >> executables and this won't be observable by looking at the system load >> (but it may at least give a hint'). >> >> [1] >> http://docs.oracle.com/javase/6/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage%28%29 > > I don't think a loop around the loop is necessary. The actual code (which is > slightly different from the snippet I posted above) already prints a message > between each attempt. > > I was thinking of just bumping the timeout from 5 seconds to, say, 60 > seconds. If it's a load issue, we should se something like "Port file values > found after 9000 ms", in which case we know for sure that it was a premature > timeout issue. If no port files materialize after >60 seconds, we can > probably safely assume that the issue is due to something else. >
Sounds good. Let's try it. > -- Andreas