Ok, a server was already hung when I got to work today.
**********************************
x4500-04: NFS Server, Sol 10 5/08
Server IP (real) 172.20.12.226 netmask ffffff00
NFS IP (alias) 172.20.12.227 netmask ffffff00
x4500-04:~# netstat -in ; netstat -rn
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis
Queue
lo0 8232 127.0.0.0 127.0.0.1 1411 0 1411 0 0
0
e1000g0 1500 172.20.12.0 172.20.12.226 2762497849 0 1789082372 0
0 0
e1000g1 1500 172.20.19.0 172.20.19.226 96059758 0 52485074 0
0 0
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 172.20.12.1 UG 1 20456
172.20.12.0 172.20.12.226 U 1 45968 e1000g0
172.20.12.0 172.20.12.227 U 1 0 e1000g0:1
172.20.19.0 172.20.19.226 U 1 1662 e1000g1
224.0.0.0 172.20.12.226 U 1 0 e1000g0
127.0.0.1 127.0.0.1 UH 5 316 lo0
**********************************
NFS client: Sol 10 5/08
Client IP 172.20.12.6 netmask ffffff00
# netstat -in ; netstat -rn
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis
Queue
lo0 8232 127.0.0.0 127.0.0.1 2175 0 2175 0 0
0
e1000g0 1500 172.20.12.0 172.20.12.6 43315618 0 41987515 0
0 0
e1000g1 1500 172.20.11.0 172.20.11.6 19673254 0 13928826 0
0 0
Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 172.20.11.4 UG 1 52386
10.0.0.0 172.20.12.1 UG 1 0
172.16.0.0 172.20.12.1 UG 1 193
172.20.11.0 172.20.11.6 U 1 2406 e1000g1
172.20.12.0 172.20.12.6 U 1 3163 e1000g0
192.168.0.0 172.20.12.1 UG 1 120
224.0.0.0 172.20.12.6 U 1 0 e1000g0
127.0.0.1 127.0.0.1 UH 4 2046 lo0
*********************************
Snoop running on NFS Client 172.20.12.6 attempting to (re)mount volume
with TCP:
# snoop -r host 172.20.12.227 or host 172.20.12.226 &
# mount /export/www
172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT)
vers=3 proto=UDP
172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=39049
172.20.12.6 -> 172.20.12.227 MOUNT3 C Null
172.20.12.226 -> 172.20.12.6 MOUNT3 R Null
172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www
172.20.12.226 -> 172.20.12.6 MOUNT3 R Mount OK FH=D402 Auth=unix
172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS)
vers=3 proto=TCP
172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=2049
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586
Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Syn Ack=788700587
Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale
0,nop,nop,sackOK>
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620
Seq=788700587 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 NFS C NULL3
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700707
Seq=3596066620 Len=0 Win=49520
172.20.12.227 -> 172.20.12.6 NFS R NULL3
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648
Seq=788700707 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648
Seq=788700707 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700708
Seq=3596066648 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Fin Ack=788700708
Seq=3596066648 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649
Seq=788700708 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
Interesting, looks like x4500-04 is replying with the wrong IP.
Packet capture on x4500-04:
# snoop -r host 172.20.12.6
Using device /dev/e1000g0 (promiscuous mode)
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 Seq=2924968134
Len=0 Win=49640
172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Rst Win=49640
172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100005 (MOUNT)
vers=3 proto=UDP
172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=39049
172.20.12.6 -> 172.20.12.227 MOUNT3 C Null
172.20.12.226 -> 172.20.12.6 MOUNT3 R Null
172.20.12.6 -> 172.20.12.227 MOUNT3 C Mount /export/www
172.20.12.226 -> 172.20.12.6 MOUNT3 R Mount OK FH=D402 Auth=unix
172.20.12.6 -> 172.20.12.227 PORTMAP C GETPORT prog=100003 (NFS)
vers=3 proto=TCP
172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=2049
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Syn Seq=788700586
Len=0 Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Syn Ack=788700587
Seq=3596066619 Len=0 Win=49640 Options=<mss 1460,nop,wscale
0,nop,nop,sackOK>
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066620
Seq=788700587 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 NFS C NULL3
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700707
Seq=3596066620 Len=0 Win=49520
172.20.12.227 -> 172.20.12.6 NFS R NULL3
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066648
Seq=788700707 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Fin Ack=3596066648
Seq=788700707 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Ack=788700708
Seq=3596066648 Len=0 Win=49640
172.20.12.227 -> 172.20.12.6 TCP D=63800 S=2049 Fin Ack=788700708
Seq=3596066648 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=63800 Ack=3596066649
Seq=788700708 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Ack=2876021783
Seq=3544124023 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Ack=2876021783
Seq=3544124023 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Syn Seq=2946510831 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.227 -> 172.20.12.6 TCP D=664 S=2049 Ack=2876021783
Seq=3544124023 Len=0 Win=49640
*** Attempting mount using the real IP instead of the alias:
# mount -o vers=3,hard,intr,quota 172.20.12.226:/export/www /export/www
ssl01:/# 172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100005
(MOUNT) vers=3 proto=UDP
172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=39049
172.20.12.6 -> 172.20.12.226 MOUNT3 C Null
172.20.12.226 -> 172.20.12.6 MOUNT3 R Null
172.20.12.6 -> 172.20.12.226 MOUNT3 C Mount /export/www
172.20.12.226 -> 172.20.12.6 MOUNT3 R Mount OK FH=D402 Auth=unix
172.20.12.6 -> 172.20.12.226 PORTMAP C GETPORT prog=100003 (NFS)
vers=3 proto=TCP
172.20.12.226 -> 172.20.12.6 PORTMAP R GETPORT port=2049
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Syn Seq=88322761 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Syn Ack=88322762
Seq=3700270536 Len=0 Win=49640 Options=<mss 1460,nop,wscale
0,nop,nop,sackOK>
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270537
Seq=88322762 Len=0 Win=49640
172.20.12.6 -> 172.20.12.226 NFS C NULL3
172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Ack=88322882
Seq=3700270537 Len=0 Win=49520
172.20.12.226 -> 172.20.12.6 NFS R NULL3
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270565
Seq=88322882 Len=0 Win=49640
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Fin Ack=3700270565
Seq=88322882 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Ack=88322883
Seq=3700270565 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6 TCP D=63802 S=2049 Fin Ack=88322883
Seq=3700270565 Len=0 Win=49640
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=63802 Ack=3700270566
Seq=88322883 Len=0 Win=49640
172.20.12.6 -> 172.20.12.227 TCP D=2049 S=664 Rst Ack=0 Seq=3056789346
Len=0 Win=49640
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Syn Seq=1932893789 Len=0
Win=49640 Options=<mss 1460,nop,wscale 0,nop,nop,sackOK>
172.20.12.226 -> 172.20.12.6 TCP D=620 S=2049 Syn Ack=1932893790
Seq=3700480396 Len=0 Win=49640 Options=<mss 1460,nop,wscale
0,nop,nop,sackOK>
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480397
Seq=1932893790 Len=0 Win=49640
172.20.12.6 -> 172.20.12.226 NFS C FSINFO3 FH=D402
172.20.12.226 -> 172.20.12.6 TCP D=620 S=2049 Ack=1932893946
Seq=3700480397 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6 NFS R FSINFO3 OK
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480565
Seq=1932893946 Len=0 Win=49640
172.20.12.6 -> 172.20.12.226 NFS C FSSTAT3 FH=D402
172.20.12.226 -> 172.20.12.6 TCP D=620 S=2049 Ack=1932894102
Seq=3700480565 Len=0 Win=49640
172.20.12.226 -> 172.20.12.6 NFS R FSSTAT3 OK
172.20.12.6 -> 172.20.12.226 TCP D=2049 S=620 Ack=3700480737
Seq=1932894102 Len=0 Win=49640
Which works without issue. So it is not an NFS problem, it seems to be
related to alias IPs.
Do you know a way around this? Or perhaps you can suggest a place where
I can go to ask. As a quick solution we will just forgo the Alias IP and
mount directly on the "real" IP. Why can I change protocol (TCP->UDP and
vv) to get around it, why can I reboot the NFS client as well. Did we
create the aliases wrong?
I apologise for the noise in NFS discussion list.
Lund
Dai Ngo wrote:
> The problem seems to be on the TCP connection between the client and the
> nfsd on
> the server. The portmap and mount requests used UDP and they went OK.
>
> There are a number TCP RST packets sent from both the client and server,
> this indicated
> there might be problem with packets lost causing both sides to be out of
> sync.
>
> Looks like the server has 2 NICs on the same subnet, 172.20.12.221 and
> 172.20.12.220.
> Have you tried disable 172.20.12.220 and just use 172.20.12.221 to see
> if it helps.
> What the output of the 'netstat -in' and 'netstat -rn' on the server and
> the client look like?
>
> By the way, where were the packets captured from? on the server or the
> client. It's more
> useful if you can capture the packets on both sides and attach the raw
> capture files so
> they can be compared and examined in more details.
--
Jorgen Lundman | <lundman at lundman.net>
Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work)
Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell)
Japan | +81 (0)3 -3375-1767 (home)