I just discovered a fully reproducible NFS problem...
Decided to install StarOffice on my PII-200MMX system; the main install worked
fine over NFS. Then, when I went to copy over the SO5.2 patch files, I noticed
that 4 of the 18 files *always* ended the copy with:
[pfortin@pfortin program]$ cp /usr/local/src/StarOffice/1*/p*/* .
cp: /usr/local/src/StarOffice/109939-02/program/libcnt569li.so: Input/output
error
cp: /usr/local/src/StarOffice/109939-02/program/libsc569li.so: Input/output
error
cp: /usr/local/src/StarOffice/109939-02/program/libsd569li.so: Input/output
error
cp: /usr/local/src/StarOffice/109939-02/program/libsfx569li.so: Input/output
error
It's always the same 4 files... even if I copy them one at a time, only those
files are affected...
bones:/usr/local/src is NFS mounted by pfortin...
I repeated the copies several times and each time the bad files were the same
ones and the resultant sizes were identical. Basically, the last NFS block gets
an I/O error... confirmed by strace.
Sniffing the LAN on the server... Note that the corrupted packet contains the
data from a packet sent MUCH EARLIER (over 760 packets between these two!!)
This seems to indicate a s/w bug... I fail to see how the LAN or h/w adapters
could hang on to this data for so long... Besides, it is repeatable and the
results are consistent.
Pierre
################################################################################
Packet: 003076/003096 Time: 19:40:14.050 Level: BYTES/ETHER/IP/UDP
+------------------------------------------------------------------------------+
| OSI-Level 1: Byte Level Packet size: 1514 |
+------------------------------------------------------------------------------+
Packet size: 0x05ea
Time stamp: 0x04388962
Network Type: 1 Ethernet/802.3
+------------------------------------------------------------------------------+
| OSI-Level 2: Ethernet Packet size: 1514 |
+------------------------------------------------------------------------------+
Source: 00:aa:00:cf:8d:65 bones Vendor: Intel
Destination: 00:60:97:57:63:07 pfortin Vendor: unknown
+------------------------------------------------------------------------------+
| OSI-Level 3: IP (Internet Protocol) Packet size: 1500 |
+------------------------------------------------------------------------------+
Type Of Service: precedence = Routine Datagram ID: 39862
delay = normal IP Control Flags: don't fragment = no
throughput = normal more fragment = yes
reliability = normal Checksum: 0x2b40
Fragment offset: 0 Time-To-Live: 64 hops
Protocol ID: 17 UDP Total length: 1500
Source: 192.168.134.100 bones
Destination: 192.168.134.101 pfortin
Options: [no options]
+------------------------------------------------------------------------------+
| OSI-Level 4: UDP (User Datagram Protocol) Packet size: 1480 |
+------------------------------------------------------------------------------+
Source Port: 2049 nfs Message Length: 4204
Destination Port: 799 Checksum: 0xef56
Data: 0000: 92 9b 43 fd 00 00 00 01 00 00 00 00 00 00 00 00 ..C.............
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................
[snip]
05a0: 01 00 00 00 30 31 2e 30 31 00 00 00 08 00 00 00 ....01.01.......
05b0: 00 00 00 00 01 00 00 00 30 31 2e 30 31 00 00 00 ........01.01...
################################################################################
Packet: 003077/003096 Time: 19:40:14.057 Level: BYTES/ETHER/IP/UDP
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This ACK packet is OK... BUT...
pay close attention to the underlined bytes...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
+------------------------------------------------------------------------------+
| OSI-Level 1: Byte Level Packet size: 170 |
+------------------------------------------------------------------------------+
Packet size: 0x00aa
Time stamp: 0x04388969
Network Type: 1 Ethernet/802.3
+------------------------------------------------------------------------------+
| OSI-Level 2: Ethernet Packet size: 170 |
+------------------------------------------------------------------------------+
Source: 00:60:97:57:63:07 pfortin Vendor: unknown
Destination: 00:aa:00:cf:8d:65 bones Vendor: Intel
+------------------------------------------------------------------------------+
| OSI-Level 3: IP (Internet Protocol) Packet size: 156 |
+------------------------------------------------------------------------------+
Type Of Service: precedence = Routine Datagram ID: 30028
delay = normal IP Control Flags: don't fragment = no
throughput = normal more fragment = no
reliability = normal Checksum: 0x76ea
Fragment offset: 0 Time-To-Live: 64 hops
Protocol ID: 17 UDP Total length: 156
Source: 192.168.134.101 pfortin
Destination: 192.168.134.100 bones
Options: [no options]
+------------------------------------------------------------------------------+
| OSI-Level 4: UDP (User Datagram Protocol) Packet size: 136 |
+------------------------------------------------------------------------------+
Source Port: 799 Message Length: 136
Destination Port: 2049 nfs Checksum: 0x576b
Data: 0000: 93 9b 43 fd 00 00 00 00 00 00 00 02 00 01 86 a3 ..C.............
0010: 00 00 00 02 00 00 00 06 00 00 00 01 00 00 00 2c ...............,
0020: 00 23 95 25 00 00 00 14 70 66 6f 72 74 69 6e 2e .#.%....pfortin.
0030: 72 65 6d 69 6e 65 72 2e 68 6f 6d 65 00 00 01 f4 reminer.home....
^^^^^^^^^^^^^^^^^^^^^^^
0040: 00 00 01 f5 00 00 00 01 00 00 01 f5 00 00 00 00 ................
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0050: 00 00 00 00 ca ba eb fe 9c 48 10 00 91 48 10 00 .........H...H..
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0060: 46 03 00 00 46 03 00 00 01 50 00 00 43 69 66 a8 F...F....P..Cif.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0070: 00 00 00 00 00 2c 60 00 00 00 10 00 00 00 10 00 .....,`.........
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
################################################################################
Packet: 003078/003096 Time: 19:40:14.057 Level: BYTES/ETHER/IP
+------------------------------------------------------------------------------+
| OSI-Level 1: Byte Level Packet size: 250 |
+------------------------------------------------------------------------------+
Packet size: 0x00fa
Time stamp: 0x04388969
Network Type: 1 Ethernet/802.3
+------------------------------------------------------------------------------+
| OSI-Level 2: Ethernet Packet size: 250 |
+------------------------------------------------------------------------------+
Source: 00:aa:00:cf:8d:65 bones Vendor: Intel
Destination: 00:60:97:57:63:07 pfortin Vendor: unknown
+------------------------------------------------------------------------------+
| OSI-Level 3: IP (Internet Protocol) Packet size: 236 |
+------------------------------------------------------------------------------+
Type Of Service: precedence = Routine Datagram ID: 39863
delay = normal IP Control Flags: don't fragment = no
throughput = normal more fragment = no
reliability = normal Checksum: 0x4f76
Fragment offset: 185 Time-To-Live: 64 hops
Protocol ID: 17 UDP Total length: 236
Source: 192.168.134.100 bones
Destination: 192.168.134.101 pfortin
Options: [no options]
Data: 0000: 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00 ................
0010: 06 01 00 00 06 00 00 00 03 00 00 00 44 38 2c 00 ............D8,.
0020: 44 28 2c 00 a0 01 00 00 03 00 00 00 00 00 00 00 D(,.............
0030: 04 00 00 00 08 00 00 00 0f 01 00 00 08 00 00 00 ................
0040: 03 00 00 00 00 3a 2c 00 00 2a 2c 00 a8 29 00 00 .....:,..*,..)..
0050: 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 ........ .......
0060: 14 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................
0070: 00 2a 2c 00 a0 23 00 00 00 00 00 00 00 00 00 00 .*,..#..........
0080: 01 00 00 00 00 00 00 00 1d 01 00 00 07 00 00 00 ................
0090: 00 00 00 00 a0 23 00 00 a0 4d 2c 00 c0 12 00 00 .....#...M,.....
00a0: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................
00b0: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................
00c0: 60 60 2c 00 23 01 00 00 00 00 00 00 00 00 00 00 ``,.#...........
00d0: 01 00 00 00 00 00 00 00 ........
################################################################################
Packet: 003079/003096 Time: 19:40:14.057 Level: BYTES/ETHER/IP/UDP
+------------------------------------------------------------------------------+
| OSI-Level 1: Byte Level Packet size: 1514 |
+------------------------------------------------------------------------------+
Packet size: 0x05ea
Time stamp: 0x04388969
Network Type: 1 Ethernet/802.3
+------------------------------------------------------------------------------+
| OSI-Level 2: Ethernet Packet size: 1514 |
+------------------------------------------------------------------------------+
Source: 00:aa:00:cf:8d:65 bones Vendor: Intel
Destination: 00:60:97:57:63:07 pfortin Vendor: unknown
+------------------------------------------------------------------------------+
| OSI-Level 3: IP (Internet Protocol) Packet size: 1500 |
+------------------------------------------------------------------------------+
Type Of Service: precedence = Routine Datagram ID: 39863
delay = normal IP Control Flags: don't fragment = no
throughput = normal more fragment = yes
reliability = normal Checksum: 0x2b3f
Fragment offset: 0 Time-To-Live: 64 hops
Protocol ID: 17 UDP Total length: 1500
Source: 192.168.134.100 bones
Destination: 192.168.134.101 pfortin
Options: [no options]
+------------------------------------------------------------------------------+
| OSI-Level 4: UDP (User Datagram Protocol) Packet size: 1480 |
+------------------------------------------------------------------------------+
Source Port: 2049 nfs Message Length: 1696
Destination Port: 799 Checksum: 0xe629
Data: 0000: 93 9b 43 fd 00 00 00 01 00 00 00 00 00 00 00 00 ..C.............
0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................
[snip]
05a0: 04 00 00 00 00 00 00 00 01 01 00 00 01 00 00 00 ................
05b0: 03 00 00 00 cc e2 2b 00 cc d2 2b 00 78 55 00 00 ......+...+.xU..
################################################################################
Packet: 003080/003096 Time: 19:40:14.150 Level: BYTES/ETHER/IP/UDP
[snip: CUPS packet from bones: Datagram ID: 39864]
################################################################################
Packet: 003081/003096 Time: 19:40:14.748 Level: BYTES/ETHER
+------------------------------------------------------------------------------+
| OSI-Level 1: Byte Level Packet size: 170 |
+------------------------------------------------------------------------------+
Packet size: 0x00aa
Time stamp: 0x04388c1c
Network Type: 1 Ethernet/802.3
+------------------------------------------------------------------------------+
| OSI-Level 2: Ethernet Packet size: 170 |
+------------------------------------------------------------------------------+
Source: 01:f4:00:00:01:f5 Vendor: unknown
^^^^^^^^^^^^^^^^^
Destination: 68:6f:6d:65:00:00 Vendor: unknown
^^^^^^^^^^^^^^^^^
Protocol ID: 0000 Type: unknown
Data: 0000: 00 01 00 00 01 f5 00 00 00 00 00 00 00 00 ca ba ................
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0010: eb fe 9c 48 10 00 91 48 10 00 46 03 00 00 46 03 ...H...H..F...F.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0020: 00 00 01 50 00 00 43 69 66 a8 00 00 00 00 00 20 ...P..Cif......
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
0030: a0 00 00 00 10 00 00 00 10 00 08 00 10 a0 2e 57 ...............W
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ?????????????????
0040: aa 00 00 aa 00 cf 8d 65 00 60 97 57 63 07 08 00 .......e.`.Wc...
????? ================= =================
To: bones From: pfortin IP
0050: 45 00 00 9c 74 8d 00 00 40 11 77 a9 c0 a8 86 65 [email protected]
VL tos len #29837 fl/frgTTLudp cksm S=pfortin
0060: c0 a8 86 64 03 1f 08 01 00 88 c2 81 d8 9a 43 fd ...d..........C.
D=bones ----------------------- ===========
0070: 00 00 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 ................
===============================================
0080: 00 00 00 06 00 00 00 01 00 00 00 2c 00 23 95 1b ...........,.#..
===============================================
0090: 00 00 00 14 70 66 6f 72 74 69 6e 2e ....pfortin.
===============================================
The entrails indicate that the above packet did come from pfortin intended for
bones as the ack packet for the last nfs block.
There is no logic between the file sizes and the failures. Nor is there any
apparent reason for why these particular files are always bad. I can FTP the
files just fine...
################################################################################
################################################################################
Here is the packet with seq#29837 sent MUCH earlier...
Note that it contains the same data as the corrupted one above... have a look
at the checksum above and below...
################################################################################
Packet: 002317/003096 Time: 19:40:04.496 Level: BYTES/ETHER/IP/UDP
+------------------------------------------------------------------------------+
| OSI-Level 1: Byte Level Packet size: 170 |
+------------------------------------------------------------------------------+
Packet size: 0x00aa
Time stamp: 0x04386410
Network Type: 1 Ethernet/802.3
+------------------------------------------------------------------------------+
| OSI-Level 2: Ethernet Packet size: 170 |
+------------------------------------------------------------------------------+
Source: 00:60:97:57:63:07 pfortin Vendor: unknown
Destination: 00:aa:00:cf:8d:65 bones Vendor: Intel
+------------------------------------------------------------------------------+
| OSI-Level 3: IP (Internet Protocol) Packet size: 156 |
+------------------------------------------------------------------------------+
Type Of Service: precedence = Routine Datagram ID: 29837
delay = normal IP Control Flags: don't fragment = no
throughput = normal more fragment = no
reliability = normal Checksum: 0x77a9
Fragment offset: 0 Time-To-Live: 64 hops
Protocol ID: 17 UDP Total length: 156
Source: 192.168.134.101 pfortin
Destination: 192.168.134.100 bones
Options: [no options]
+------------------------------------------------------------------------------+
| OSI-Level 4: UDP (User Datagram Protocol) Packet size: 136 |
+------------------------------------------------------------------------------+
Source Port: 799 Message Length: 136
Destination Port: 2049 nfs Checksum: 0xc281
Data: 0000: d8 9a 43 fd 00 00 00 00 00 00 00 02 00 01 86 a3 ..C.............
0010: 00 00 00 02 00 00 00 06 00 00 00 01 00 00 00 2c ...............,
0020: 00 23 95 1b 00 00 00 14 70 66 6f 72 74 69 6e 2e .#......pfortin.
0030: 72 65 6d 69 6e 65 72 2e 68 6f 6d 65 00 00 01 f4 reminer.home....
0040: 00 00 01 f5 00 00 00 01 00 00 01 f5 00 00 00 00 ................
0050: 00 00 00 00 ca ba eb fe 9c 48 10 00 91 48 10 00 .........H...H..
0060: 46 03 00 00 46 03 00 00 01 50 00 00 43 69 66 a8 F...F....P..Cif.
0070: 00 00 00 00 00 20 b0 00 00 00 10 00 00 00 10 00 ..... ..........