On Fri, 3 Dec 2010 15:12:28 -0600 Steve French <[email protected]> wrote:
> On Fri, Dec 3, 2010 at 2:21 PM, Volker Lendecke > <[email protected]>wrote: > > > On Fri, Dec 03, 2010 at 01:50:11PM -0500, Jeff Layton wrote: > > > > Probably needs two tests. One to see what happens if the (single) > > > > connection is lost, and another to see what happens if a single > > operation > > > > takes a very, very long time to complete (as you describe). > > > > > > > > > > I did an experiment with this on win2k8. I first doctored an smbd to > > > discard write requests. When I try to copy a file to this host (via > > > copy.exe), the server usually waits a little while (the time seems to > > > vary between 30-60s or so), sends a single echo request and then > > > reconnects the socket if it still doesn't get a write reply in about > > > 30s. copy.exe then says "The specified network name is no longer > > > available." Heh. > > > > > > That said, the behavior seems to be really inconsistent. In at least > > > one case, no echo was sent and the socket was shut down <30s after the > > > write request was sent. > > > > > > The timeout before sending an echo also seems to vary quite a bit. My > > > suspicion is that that indicates that the client has the echo ping on a > > > separate timer, and just selectively sends it whenever the timer pops > > > based on certain criteria. > > > > Probably all this timeout stuff varies too much with > > different application behaviours. I have the same discussion > > right now with the opposite direction: How can a server > > reliably tell that a client died hard? The question here is: > > When can we reliably throw away share mode entries? A > > colleague just measured a W2k8 timeout of 5 minutes in this > > case, but is this dependable? I suspect we have to develop > > our own policies for this. > > > > A loosely related question is whether POSIX forbids > EIO or EHOSTDOWN on some syscalls. If such were > specified in the standard, at least for those syscalls posix > clients can never time out (or must timeout and either > cancel/resubmit and/or reconnect transparently) > Currently write beyond end of file (and operations on > offline files) are the only known special cases where timeout would > be inappropriate, but we may find other syscalls where it > would be inappropriate for a client to return to the user. > > EHOSTDOWN is not a valid return for all filesystem-based syscalls in POSIX. In a quick grep of the Linux manpages, it looks like it's only a valid return code for accept(): [jlay...@tlielax man2]$ pwd /usr/share/man/man2 [jlay...@tlielax man2]$ zgrep EHOSTDOWN ./* ./accept.2.gz:.BR EHOSTDOWN , That's hardly authoritative for POSIX, but I'd be quite surprised if it's incorrect. EIO on the other hand is allowed almost everywhere (since it's such a non-specific error code). I think Volker's correct. The spec really isn't going to be particularly helpful in this regard, though understanding Windows' behavior is an interesting datapoint for developing our own policies. Treating different calls differently for timeouts sounds like the road to special-case madness. It seems to me that the best behavior would be to have the client wait for a reply indefinitely if the server is responding to periodic echoes. If that's unacceptable then perhaps a tunable timeout that defaults to something very long (10 minutes or so). > For Windows (Windows behavior may be slightly different > than POSIX but still important for implementers to understand) > it would be helpful to know which operations > are allowed to return errors to the user (if the host > hangs or goes down) and which must retry forever. > -- Jeff Layton <[email protected]> _______________________________________________ cifs-protocol mailing list [email protected] https://lists.samba.org/mailman/listinfo/cifs-protocol
