After 5 minutes of no activity, a Linux NFSv3 client using proto=tcp to an
OpenBSD nfsd server will appear to hang. A packet trace shows the FIN packet
being ACKed by the OpenBSD server, but the connection remains in CLOSE_WAIT.

On the client, an RPC idle timer had expired to call a "graceful"
shutdown(SHUT_RDWR) on the TCP connection, expecting the server to complete
the close, which it doesn't.

While we wait for Linux to fix this with a harder close() or another
timeout, the OpenBSD NFS server could be kind and handle the
client-side shutdown, perhaps like this:

--- sys/nfs/nfs_socket.c.orig   Sun Nov 29 14:23:18 2020
+++ sys/nfs/nfs_socket.c        Sun Nov 29 15:15:55 2020
@@ -1581,7 +1581,7 @@
                error = soreceive(so, &nam, &auio, &mp, NULL,
                    &flags, 0);
                if (error || mp == NULL) {
-                       if (error == EWOULDBLOCK)
+                       if (error == EWOULDBLOCK && !(so->so_state & 
SS_CANTRCVMORE))
                                slp->ns_flag |= SLP_NEEDQ;
                        else
                                slp->ns_flag |= SLP_DISCONN;


I'll also note that the usual workaround of switching the client to UDP is
getting harder as recent Linux disabled NFS UDP protocol support by default.
Another workaround is to stat the mount dir on the client every 4 minutes.

Reply via email to