/* HINT: Search archives @ http://www.indyramp.com/masq/ before posting! /* ALSO: Don't quote this header. It makes you look lame :-) */ Guy Coates wrote: > I am having problems reading NFS filesystems on a cluster of machines > which are using masquerading to present themselves as a single host to an > NFS server. > > Initially, my MASQ setup functions correctly. I can successfully mount NFS > disks from behind the gateway, and running TCP dump on the MASQ gateway > shows the NFS RPC requests being correctly masqueraded and demasqeraded: > > [node29 is an NFS client using masquerading, valm is the MASQ gateway, and > nutmeg/lmb is the NFS server ] > > 13:01:11.725170 node29.biop.ox..3993899405 > nutmeg.biop.ox..nfs: 104 > getattr fh GFIA/4 > 13:01:11.725219 valm.biop.ox.ac.3993899405 > nutmeg.biop.ox..nfs: 104 > getattr fh GFIA/4 > 13:01:11.734738 lmb.biop.ox.ac..nfs > valm.biop.ox.ac.3993899405: reply ok > 96 (DF) > 13:01:11.734757 lmb.biop.ox.ac..nfs > node29.biop.ox..3993899405: reply ok > 96 (DF) > > > During high network load the cluster NFS clients will unrecoverably loose > the NFS server, giving the usual NFS errors in the client logs. > > Jun 18 17:38:04 node9 kernel: nfs: server nutmeg.biop.ox.ac.uk still not > responding > Jun 18 17:38:08 node9 kernel: nfs: task 73528 can't get a request slot > > > Running TCP dump on the MASQ gateway seems to indicate that the problem > might be due to the demasqerading of NFS 'reply OK' messages, sent when > the NFS server is able to complete requests. According to tcpdump, the > request is demasqeraded incorrectly and sent to the wrong client. > > [node9 and node13 are both clients, valm is the MASQ gateway and > nutmeg/lmb the NFS server] > > 13:01:04.296112 node9.biop.ox.a.3873443981 > nutmeg.biop.ox..nfs: 68 null > 13:01:04.296127 valm.biop.ox.ac.3873443981 > nutmeg.biop.ox..nfs: 68 null > 13:01:04.298254 lmb.biop.ox.ac..nfs > valm.biop.ox.ac.3873443981: reply ok > 24 > 13:01:04.298274 lmb.biop.ox.ac..nfs > node13.biop.ox..3873443981: reply ok > 24 > > In the above case, the request is sent from node9 but gets returned to > node13. No other network services seem to be affected; I can still ftp, > ssh, ping etc from the masqueraded machines even when the NFS is down. > > Configuration: > > The client machines and MASQ gateway are all running kernel 2.2.19. As > well as the masqeraded nodes, one of the machines on the internal network > has a 1:1 NAT to make it visible to the outside world. The NFS server is > hosted on an VMS cluster. > > /sbin/ip route add nat 163.1.16.54 via 192.168.1.113 > /sbin/ip rule add prio 320 from 192.168.1.113 nat 163.1.16.54 > > /sbin/ipchains -P forward DENY > /sbin/ipchains -A forward -i eth0 -s 163.1.16.54 -j ACCEPT > /sbin/ipchains -A forward -i eth1 -d 192.168.1.113 -j ACCEPT > /sbin/ipchains -A forward -i eth0 -s 192.168.1.0/24 -j MASQ > > If anyone has any suggestion on how to remedy the problem I would be > grateful. Full packet dumps etc are available on request. > > Regards, > > Guy Coates might the demasquerading fail because the masqueraded packets go to "nutmeg" but the reply packets to be demasqueraded come from "lmb"? i take it that they are separate ip addresses. if masquerading relies on the entire socket pair, this might be the problem but if it was, you'd probably find that it never worked. raf _______________________________________________ Masq maillist - [EMAIL PROTECTED] Admin requests can be handled at http://www.indyramp.com/masq-list/ -- THIS INCLUDES UNSUBSCRIBING! or email to [EMAIL PROTECTED] PLEASE read the HOWTO and search the archives before posting. You can start your search at http://www.indyramp.com/masq/ Please keep general linux/unix/pc/internet questions off the list.
