Adam Thornton writes: > On Wed, 2004-02-25 at 12:42, McKown, John wrote: > > If you do not recommend the "soft" option (at least for R/W), what else is > > possible? If the NFS server "dies" or is unavailable for some reason, does > > that mean that all the client boxes which use it should die as well? > > Yes. > > If you're mounting files you need to have read-write, and the underlying > filesystem goes away, you absolutely do not want to continue operations > with the files you have open. If you do keep going, e.g. with a soft > mount, you're looking at Data Corruption City.
To expand on this a little: there are two independent two-way choices for "how do I want the NFS filesystem to behave when it stops behaving like the local filesystem it's pretending to be?". One choice is soft v. hard, the other choice is intr v. nointr. The defaults are hard and nointr. The four combinations have the following properties: hard,nointr The default. Makes the filesystem behave (a little more) like a local filesystem in the sense that a read or write of n bytes will wait uninterruptibly until it has fully succeeded or failed[*]. hard,intr The useful alternative. Weakens the pretence of local filesystem semantics but only a little. If an interrupt (SIGINT, Ctrl/C, ...) occurs "during" a read(), then it returns with errno EINTR or a short read (not sure if NFS will actually do the latter). This doesn't usually confuse applications since EINTR must be handled anyway in the case it arrives "just before" the read and if the application is designed to cope with reading from terminals, pipes or devices then it needs to cope with short reads anyway. An EINTR in the middle of a write() is a bit nastier since you don't know what happened server-side (but then if you cared about exactly what data is on the server you'd either take more care of the NFS server or not use NFS). soft,nointr (or soft,intr I suppose) This weakens the pretence of a normal local filesystem even more, at least insofar as people trust "quality of implementation" as well as the letter of the law. If the NFS server times out (either because it's down or because the network's congested or because various timeout values have been tweaked) then the read()/write() returns with errno EIO meaning an I/O error. Now, many applications follow the methodology of "if you can't handle it, don't test for it" and other follow the methodology of "being coded by a lazy git who doesn't even test for errors" in which case your data is toast. Yes, it would also be toast if the local filesystem started giving I/O errors but such things are normally handled at a different level (shout at whoever implemented the RAID solution and/or the hardware vendor). Of the choices available, "hard,intr" tends to give much more useful and safe semantics than "soft" but, even so, needs careful thought and effort which could have been prevented by more effort in making the NFS server more reliable. A default "hard" mount will pick up the read/write transparently when the server comes back up again given the statelessness of NFS[*] so it's only "long" outages that matter. --Malcolm [*] Yes, those are lies but are close enough for this explanation. -- Malcolm Beattie <[EMAIL PROTECTED]> Linux Technical Consultant IBM EMEA Enterprise Server Group... ...from home, speaking only for myself
