> Hey Folks,

Hi.

> Recently I had a problem where a *nix system NFS was hung on a server
> which had "gone away," but the client hadn't umounted the filesystem.

Yep, normal so far....

> Later, this caused a script in cron to fail, in that a df command inside
> the script never completed, and instead it "hung," causing the script
> to hang awaiting a completion of df, which it never got.

True, true....

> 999 times out of 1000 this has not failed, but when that one time
> comes along, all hell breaks loose.

Wouldn't necesarily needed to have happened.

> I'm not sure what approach to take to alleviate the cascading failure.
> I'd prefer to just abort the df, log the error, and complete the rest
> of the script.  Short of totally re-writing the script (it's not mine,
> to begin with), I would like to modify it.  It's a simple system command
> being used:
>
>       system ("/usr/sbin/df -kl");

Why not see if the NFS has bombed out before trying to use it? Maybe just
trying to do an ls or something which wouldn't need to hang your process,
but rather give you a failure notice? Not sure what the best approach is,
but you should really check for success or failure of the system command
executed. You would still experience that the script runs slow, and waits
for a time out, but it wouldn't necesarily have to hang the script....
Google for "NFS timeouts" for example to see what could be done for testing.
For example, is the server still running that you mounted the NFS share
from? A first test could posibly be to see if the NFS server is still there,
and then test the NFS share itself.

One place to look for more info could be:

        http://nfs.sourceforge.net/

It all depends on your setup and how the mounts are done etc.

> Ideas?

Just the ones above.. ;)

//Anders//


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to