/* HINT: Search archives @ http://www.indyramp.com/masq/ before posting! /* ALSO: Don't quote this header. It makes you look lame :-) */ I am having problems reading NFS filesystems on a cluster of machines which are using masquerading to present themselves as a single host to an NFS server. Initially, my MASQ setup functions correctly. I can successfully mount NFS disks from behind the gateway, and running TCP dump on the MASQ gateway shows the NFS RPC requests being correctly masqueraded and demasqeraded: [node29 is an NFS client using masquerading, valm is the MASQ gateway, and nutmeg/lmb is the NFS server ] 13:01:11.725170 node29.biop.ox..3993899405 > nutmeg.biop.ox..nfs: 104 getattr fh GFIA/4 13:01:11.725219 valm.biop.ox.ac.3993899405 > nutmeg.biop.ox..nfs: 104 getattr fh GFIA/4 13:01:11.734738 lmb.biop.ox.ac..nfs > valm.biop.ox.ac.3993899405: reply ok 96 (DF) 13:01:11.734757 lmb.biop.ox.ac..nfs > node29.biop.ox..3993899405: reply ok 96 (DF) During high network load the cluster NFS clients will unrecoverably loose the NFS server, giving the usual NFS errors in the client logs. Jun 18 17:38:04 node9 kernel: nfs: server nutmeg.biop.ox.ac.uk still not responding Jun 18 17:38:08 node9 kernel: nfs: task 73528 can't get a request slot Running TCP dump on the MASQ gateway seems to indicate that the problem might be due to the demasqerading of NFS 'reply OK' messages, sent when the NFS server is able to complete requests. According to tcpdump, the request is demasqeraded incorrectly and sent to the wrong client. [node9 and node13 are both clients, valm is the MASQ gateway and nutmeg/lmb the NFS server] 13:01:04.296112 node9.biop.ox.a.3873443981 > nutmeg.biop.ox..nfs: 68 null 13:01:04.296127 valm.biop.ox.ac.3873443981 > nutmeg.biop.ox..nfs: 68 null 13:01:04.298254 lmb.biop.ox.ac..nfs > valm.biop.ox.ac.3873443981: reply ok 24 13:01:04.298274 lmb.biop.ox.ac..nfs > node13.biop.ox..3873443981: reply ok 24 In the above case, the request is sent from node9 but gets returned to node13. No other network services seem to be affected; I can still ftp, ssh, ping etc from the masqueraded machines even when the NFS is down. Configuration: The client machines and MASQ gateway are all running kernel 2.2.19. As well as the masqeraded nodes, one of the machines on the internal network has a 1:1 NAT to make it visible to the outside world. The NFS server is hosted on an VMS cluster. /sbin/ip route add nat 163.1.16.54 via 192.168.1.113 /sbin/ip rule add prio 320 from 192.168.1.113 nat 163.1.16.54 /sbin/ipchains -P forward DENY /sbin/ipchains -A forward -i eth0 -s 163.1.16.54 -j ACCEPT /sbin/ipchains -A forward -i eth1 -d 192.168.1.113 -j ACCEPT /sbin/ipchains -A forward -i eth0 -s 192.168.1.0/24 -j MASQ If anyone has any suggestion on how to remedy the problem I would be grateful. Full packet dumps etc are available on request. Regards, Guy Coates ---------------------------------------------------------------- --A fool and his money are soon venture capital. Tel : +44 (0)1865 275390 (W) +44 (0)7801 710224 (M) Mail : Laboratory of Molecular Biophysics, University of Oxford, Rex Richards Building, South Parks Road, Oxford, OX1 3QU _______________________________________________ Masq maillist - [EMAIL PROTECTED] Admin requests can be handled at http://www.indyramp.com/masq-list/ -- THIS INCLUDES UNSUBSCRIBING! or email to [EMAIL PROTECTED] PLEASE read the HOWTO and search the archives before posting. You can start your search at http://www.indyramp.com/masq/ Please keep general linux/unix/pc/internet questions off the list.
