On Wed, 12 Apr 2006, Mike Marion wrote: > One of the last problems we're having with autofs4 is the fact that we > see a lot of cases where a "remote" job can't see files in nfs. ...snip... > programs they run will stat a file looking for it's input, or binary, or > whatever, then it's like the mount doesn't happen, or isn't atomic > enough, and the program will fail. If we immediately log into the host > and do something like ls the same file.. it works fine.
>From the log file you showed, it looks like autofs tried to do the right thing, but NFS mounting was attempted and failed, after a 20 second timeout. So that's the tree I'm barking up. A. At night when we do backups, we overload our network (despite various off-topic mitigation strategies), and we regularly get autofs/NFS mount failures similar to what you describe. I'd estimate that the probability of a mount failing, during stormy network conditions, is 0.5% to 1%. Maybe less. It's time for some more mitigation $trategies. B. Most machines are using a not-too-ancient distro release (SuSE 9.2, same as yours), and NFS transport is by TCP by default. But some have not yet been upgraded, and are using the traditional UDP. I have the impression that those are a lot more likely to have a mount failure. With UDP the only way to detect a trashed packet is by a timeout, typically 30 secs unless you've reduced it, whereas with TCP the timeouts are much shorter and auditing is more strict so a trashed packet is often recognized without a timeout. Not to say that your problem is surely the same as ours, but it would be worth looking into these two issues. We're using autofs-4.1.3 and kernel-smp-2.6.8-24.20 -- wasn't 2.6.5 used in SuSE 9.1 rather than 9.2? Anyway, your and our setup are very similar, and we have very reliable performance (except sometimes) from autofs/NFS in the same general usage style as yours, i.e. "power users" execute on their workstations with homedirs on servers, and a Load Sharing Facility (Sun Grid Engine) without the user actually logging into the compute node. James F. Carter Voice 310 825 2897 FAX 310 206 6673 UCLA-Mathnet; 6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555 Email: [EMAIL PROTECTED] http://www.math.ucla.edu/~jimc (q.v. for PGP key) _______________________________________________ autofs mailing list [email protected] http://linux.kernel.org/mailman/listinfo/autofs
