/* HINT: Search archives @ http://www.indyramp.com/masq/ before posting! 
/* ALSO: Don't quote this header. It makes you look lame :-) */


I am having problems reading NFS filesystems on a cluster of machines
which are using masquerading to present themselves as a single host to an
NFS server.

Initially, my MASQ setup functions correctly. I can successfully mount NFS
disks from behind the gateway, and running TCP dump on the MASQ gateway
shows the NFS RPC requests being correctly masqueraded and demasqeraded:

[node29 is an NFS client using masquerading, valm is the MASQ gateway, and
nutmeg/lmb is the  NFS server ]

13:01:11.725170 node29.biop.ox..3993899405 > nutmeg.biop.ox..nfs: 104
getattr fh GFIA/4
13:01:11.725219 valm.biop.ox.ac.3993899405 > nutmeg.biop.ox..nfs: 104
getattr fh GFIA/4
13:01:11.734738 lmb.biop.ox.ac..nfs > valm.biop.ox.ac.3993899405: reply ok
96 (DF)
13:01:11.734757 lmb.biop.ox.ac..nfs > node29.biop.ox..3993899405: reply ok
96 (DF)


During high network load the cluster NFS clients will unrecoverably loose
the NFS server, giving the usual NFS errors in the client logs.

Jun 18 17:38:04 node9 kernel: nfs: server nutmeg.biop.ox.ac.uk still not
responding  
Jun 18 17:38:08 node9 kernel: nfs: task 73528 can't get a request slot  


Running TCP dump on the MASQ gateway seems to indicate that the problem
might be due to the demasqerading of NFS 'reply OK' messages, sent when
the NFS server is able to complete requests. According to tcpdump, the
request is demasqeraded incorrectly and sent to the wrong client.

[node9 and node13 are both clients, valm is the MASQ gateway and
nutmeg/lmb the NFS server]

13:01:04.296112 node9.biop.ox.a.3873443981 > nutmeg.biop.ox..nfs: 68 null
13:01:04.296127 valm.biop.ox.ac.3873443981 > nutmeg.biop.ox..nfs: 68 null
13:01:04.298254 lmb.biop.ox.ac..nfs > valm.biop.ox.ac.3873443981: reply ok
24
13:01:04.298274 lmb.biop.ox.ac..nfs > node13.biop.ox..3873443981: reply ok
24

In the above case, the request is sent from node9 but gets returned to
node13. No other network services seem to be affected; I can still ftp,
ssh, ping etc from the masqueraded machines even when the NFS is down.

Configuration:

The client machines and MASQ gateway are all running kernel 2.2.19. As
well as the masqeraded nodes, one of the machines on the internal network
has a 1:1 NAT to make it visible to the outside world. The NFS server is
hosted on an VMS cluster.

/sbin/ip route add nat 163.1.16.54 via 192.168.1.113
/sbin/ip rule add prio 320 from 192.168.1.113 nat 163.1.16.54

/sbin/ipchains -P forward DENY
/sbin/ipchains -A forward -i eth0 -s 163.1.16.54    -j ACCEPT
/sbin/ipchains -A forward -i eth1 -d 192.168.1.113  -j ACCEPT
/sbin/ipchains -A forward -i eth0 -s 192.168.1.0/24 -j MASQ

If anyone has any suggestion on how to remedy the problem I would be
grateful. Full packet dumps etc are available on request.

Regards,

Guy Coates



----------------------------------------------------------------
                 --A fool and his money are soon venture capital.

   Tel   : +44 (0)1865 275390 (W)  +44 (0)7801 710224 (M)
   Mail  : Laboratory of Molecular Biophysics,
           University of Oxford,
           Rex Richards Building, South Parks Road,
           Oxford, OX1 3QU

_______________________________________________
Masq maillist  -  [EMAIL PROTECTED]
Admin requests can be handled at http://www.indyramp.com/masq-list/ -- 
THIS INCLUDES UNSUBSCRIBING!
or email to [EMAIL PROTECTED]

PLEASE read the HOWTO and search the archives before posting.
You can start your search at http://www.indyramp.com/masq/
Please keep general linux/unix/pc/internet questions off the list.

Reply via email to