On Mon, Jan 18, 2016 at 3:34 PM, Andreas Lundblad
<andreas.lundb...@oracle.com> wrote:
>> >> Interesting observation. The code for waiting for valid port file values
>> >> basically looks like
>> >>
>> >>      for (int i = 0; i < 10; i++) {
>> >>          checkPortFile();
>> >>          if (successful)
>> >>              break;
>> >>          sleep(500);
>> >>      }
>> >>
>> >> so the fact that it even reaches 6676 ms looks suspicious when it comes to
>> >> load.
>> >
>> >
>> > Why? Under load those sleep(500)'s might not return for much longer; and 
>> > the
>> > whole things might be time preempted at any point for an extended period of
>> > time.
>
> What I meant was that the fact that the code takes 6676 ms to complete 
> increases my suspicion about it being due to a load issue.
>
>
>> What about putting another loop around this loop which prints a
>> warning to stdout (e.g. "..trying to connect to sjavac server since X
>> seconds") for another five or so times. We could also print the system
>> load [1] although I'm not sure it's worth it. On our AIX machines it
>> is often a network/NFS problem which causes long startup times of new
>> executables and this won't be observable by looking at the system load
>> (but it may at least give a hint').
>>
>> [1] 
>> http://docs.oracle.com/javase/6/docs/api/java/lang/management/OperatingSystemMXBean.html#getSystemLoadAverage%28%29
>
> I don't think a loop around the loop is necessary. The actual code (which is 
> slightly different from the snippet I posted above) already prints a message 
> between each attempt.
>
> I was thinking of just bumping the timeout from 5 seconds to, say, 60 
> seconds. If it's a load issue, we should se something like "Port file values 
> found after 9000 ms", in which case we know for sure that it was a premature 
> timeout issue. If no port files materialize after >60 seconds, we can 
> probably safely assume that the issue is due to something else.
>

Sounds good. Let's try it.

> -- Andreas

Reply via email to