On Oct 17, 2007, at 9:31 AM, Ethan Mallove wrote:

Either or both of those would be fine (don't we have a timeout in
DoCommand.pm already?).

There is a timeout in DoCommand, but currently I keep
reinvoking DoCommand on each "interrupted system call" so
the timeout gets reset each time. This would not be the case
if the do-while were to go in DoCommand.

Ah -- I see what you're saying.  Good point -- I agree.

Then again, an
infinite loop is certain in the case of a command that is
*expected* to output "interrupted system call".

But only if that command *always* output "interrupted system call". So yes, I'm a bit paranoid about an unlikely corner case. But we might as well handle it in the off-chance that it happens (and output a noisy error message so that you can tell if it happened, because that likely means that something is wrong with your cluster infrastructure).

And bang on your OS guys to fix the real problem while you're at it. ;-)

--
Jeff Squyres
Cisco Systems

Reply via email to