I just discovered a fully reproducible NFS problem...

Decided to install StarOffice on my PII-200MMX system; the main install worked
fine over NFS.  Then, when I went to copy over the SO5.2 patch files, I noticed
that 4 of the 18 files *always* ended the copy with: 
[pfortin@pfortin program]$ cp /usr/local/src/StarOffice/1*/p*/* .
cp: /usr/local/src/StarOffice/109939-02/program/libcnt569li.so: Input/output
cp: /usr/local/src/StarOffice/109939-02/program/libsc569li.so: Input/output
cp: /usr/local/src/StarOffice/109939-02/program/libsd569li.so: Input/output
cp: /usr/local/src/StarOffice/109939-02/program/libsfx569li.so: Input/output

It's always the same 4 files...  even if I copy them one at a time, only those
files are affected...

bones:/usr/local/src is NFS mounted by pfortin...

I repeated the copies several times and each time the bad files were the same
ones and the resultant sizes were identical.  Basically, the last NFS block gets
an I/O error...  confirmed by strace.

Sniffing the LAN on the server...  Note that the corrupted packet contains the
data from a packet sent MUCH EARLIER (over 760 packets between these two!!)
This seems to indicate a s/w bug...  I fail to see how the LAN or h/w adapters
could hang on to this data for so long...  Besides, it is repeatable and the
results are consistent.


Packet: 003076/003096   Time: 19:40:14.050   Level: BYTES/ETHER/IP/UDP

| OSI-Level 1: Byte Level                                    Packet size: 1514 |
    Packet size: 0x05ea                                                         
     Time stamp: 0x04388962                                                     
   Network Type: 1 Ethernet/802.3                                               
| OSI-Level 2: Ethernet                                      Packet size: 1514 |
      Source: 00:aa:00:cf:8d:65  bones            Vendor: Intel                 
 Destination: 00:60:97:57:63:07  pfortin          Vendor: unknown               
| OSI-Level 3: IP (Internet Protocol)                        Packet size: 1500 |
 Type Of Service: precedence  = Routine      Datagram ID: 39862                 
                  delay       = normal  IP Control Flags: don't fragment = no   
                  throughput  = normal                    more fragment  = yes  
                  reliability = normal          Checksum: 0x2b40                
 Fragment offset: 0                         Time-To-Live: 64 hops               
     Protocol ID: 17 UDP                    Total length: 1500                  
          Source: bones                                         
     Destination: pfortin                                       
         Options: [no options]                                                  
| OSI-Level 4: UDP (User Datagram Protocol)                  Packet size: 1480 |
         Source Port:  2049  nfs          Message Length: 4204                  
    Destination Port:   799                     Checksum: 0xef56                
 Data: 0000: 92 9b 43 fd 00 00 00 01 00 00 00 00 00 00 00 00 ..C.............   
       0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................   
       05a0: 01 00 00 00 30 31 2e 30 31 00 00 00 08 00 00 00 ....01.01.......   
       05b0: 00 00 00 00 01 00 00 00 30 31 2e 30 31 00 00 00 ........01.01...   

Packet: 003077/003096   Time: 19:40:14.057   Level: BYTES/ETHER/IP/UDP

This ACK packet is OK...  BUT...
pay close attention to the underlined bytes...

| OSI-Level 1: Byte Level                                    Packet size:  170 |
    Packet size: 0x00aa                                                         
     Time stamp: 0x04388969                                                     
   Network Type: 1 Ethernet/802.3                                               
| OSI-Level 2: Ethernet                                      Packet size:  170 |
      Source: 00:60:97:57:63:07  pfortin          Vendor: unknown               
 Destination: 00:aa:00:cf:8d:65  bones            Vendor: Intel                 
| OSI-Level 3: IP (Internet Protocol)                        Packet size:  156 |
 Type Of Service: precedence  = Routine      Datagram ID: 30028        
                  delay       = normal  IP Control Flags: don't fragment = no   
                  throughput  = normal                    more fragment  = no   
                  reliability = normal          Checksum: 0x76ea                
 Fragment offset: 0                         Time-To-Live: 64 hops               
     Protocol ID: 17 UDP                    Total length: 156                   
          Source: pfortin                                       
     Destination: bones                                         
         Options: [no options]                                                  
| OSI-Level 4: UDP (User Datagram Protocol)                  Packet size:  136 |
         Source Port:   799               Message Length: 136                   
    Destination Port:  2049  nfs                Checksum: 0x576b                
 Data: 0000: 93 9b 43 fd 00 00 00 00 00 00 00 02 00 01 86 a3 ..C.............   
       0010: 00 00 00 02 00 00 00 06 00 00 00 01 00 00 00 2c ...............,   
       0020: 00 23 95 25 00 00 00 14 70 66 6f 72 74 69 6e 2e .#.%....pfortin.   
       0030: 72 65 6d 69 6e 65 72 2e 68 6f 6d 65 00 00 01 f4 reminer.home....  
       0040: 00 00 01 f5 00 00 00 01 00 00 01 f5 00 00 00 00 ................   
       0050: 00 00 00 00 ca ba eb fe 9c 48 10 00 91 48 10 00 .........H...H..   
       0060: 46 03 00 00 46 03 00 00 01 50 00 00 43 69 66 a8 F...F....P..Cif.   
       0070: 00 00 00 00 00 2c 60 00 00 00 10 00 00 00 10 00 .....,`.........   

Packet: 003078/003096   Time: 19:40:14.057   Level: BYTES/ETHER/IP

| OSI-Level 1: Byte Level                                    Packet size:  250 |
    Packet size: 0x00fa                                                         
     Time stamp: 0x04388969                                                     
   Network Type: 1 Ethernet/802.3                                               
| OSI-Level 2: Ethernet                                      Packet size:  250 |
      Source: 00:aa:00:cf:8d:65  bones            Vendor: Intel                 
 Destination: 00:60:97:57:63:07  pfortin          Vendor: unknown               
| OSI-Level 3: IP (Internet Protocol)                        Packet size:  236 |
 Type Of Service: precedence  = Routine      Datagram ID: 39863                 
                  delay       = normal  IP Control Flags: don't fragment = no   
                  throughput  = normal                    more fragment  = no   
                  reliability = normal          Checksum: 0x4f76                
 Fragment offset: 185                       Time-To-Live: 64 hops               
     Protocol ID: 17 UDP                    Total length: 236                   
          Source: bones                                         
     Destination: pfortin                                       
         Options: [no options]                                                  
 Data: 0000: 00 00 00 00 00 00 00 00 04 00 00 00 04 00 00 00 ................   
       0010: 06 01 00 00 06 00 00 00 03 00 00 00 44 38 2c 00 ............D8,.   
       0020: 44 28 2c 00 a0 01 00 00 03 00 00 00 00 00 00 00 D(,.............   
       0030: 04 00 00 00 08 00 00 00 0f 01 00 00 08 00 00 00 ................   
       0040: 03 00 00 00 00 3a 2c 00 00 2a 2c 00 a8 29 00 00 .....:,..*,..)..   
       0050: 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 ........ .......   
       0060: 14 01 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................   
       0070: 00 2a 2c 00 a0 23 00 00 00 00 00 00 00 00 00 00 .*,..#..........   
       0080: 01 00 00 00 00 00 00 00 1d 01 00 00 07 00 00 00 ................   
       0090: 00 00 00 00 a0 23 00 00 a0 4d 2c 00 c0 12 00 00 .....#...M,.....   
       00a0: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................   
       00b0: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 ................   
       00c0: 60 60 2c 00 23 01 00 00 00 00 00 00 00 00 00 00 ``,.#...........   
       00d0: 01 00 00 00 00 00 00 00                         ........           

Packet: 003079/003096   Time: 19:40:14.057   Level: BYTES/ETHER/IP/UDP

| OSI-Level 1: Byte Level                                    Packet size: 1514 |
    Packet size: 0x05ea                                                         
     Time stamp: 0x04388969                                                     
   Network Type: 1 Ethernet/802.3                                               
| OSI-Level 2: Ethernet                                      Packet size: 1514 |
      Source: 00:aa:00:cf:8d:65  bones            Vendor: Intel                 
 Destination: 00:60:97:57:63:07  pfortin          Vendor: unknown               
| OSI-Level 3: IP (Internet Protocol)                        Packet size: 1500 |
 Type Of Service: precedence  = Routine      Datagram ID: 39863                 
                  delay       = normal  IP Control Flags: don't fragment = no   
                  throughput  = normal                    more fragment  = yes  
                  reliability = normal          Checksum: 0x2b3f                
 Fragment offset: 0                         Time-To-Live: 64 hops               
     Protocol ID: 17 UDP                    Total length: 1500                  
          Source: bones                                         
     Destination: pfortin                                       
         Options: [no options]                                                  
| OSI-Level 4: UDP (User Datagram Protocol)                  Packet size: 1480 |
         Source Port:  2049  nfs          Message Length: 1696                  
    Destination Port:   799                     Checksum: 0xe629                
 Data: 0000: 93 9b 43 fd 00 00 00 01 00 00 00 00 00 00 00 00 ..C.............   
       0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 ................   
       05a0: 04 00 00 00 00 00 00 00 01 01 00 00 01 00 00 00 ................   
       05b0: 03 00 00 00 cc e2 2b 00 cc d2 2b 00 78 55 00 00 ......+...+.xU..   

Packet: 003080/003096   Time: 19:40:14.150   Level: BYTES/ETHER/IP/UDP

[snip: CUPS packet from bones:  Datagram ID: 39864]

Packet: 003081/003096   Time: 19:40:14.748   Level: BYTES/ETHER

| OSI-Level 1: Byte Level                                    Packet size:  170 |
    Packet size: 0x00aa                                                         
     Time stamp: 0x04388c1c                                                     
   Network Type: 1 Ethernet/802.3                                               
| OSI-Level 2: Ethernet                                      Packet size:  170 |
      Source: 01:f4:00:00:01:f5                   Vendor: unknown   
 Destination: 68:6f:6d:65:00:00                   Vendor: unknown      
 Protocol ID: 0000   Type: unknown                                              
 Data: 0000: 00 01 00 00 01 f5 00 00 00 00 00 00 00 00 ca ba ................ 
       0010: eb fe 9c 48 10 00 91 48 10 00 46 03 00 00 46 03 ...H...H..F...F.   
       0020: 00 00 01 50 00 00 43 69 66 a8 00 00 00 00 00 20 ...P..Cif......    
       0030: a0 00 00 00 10 00 00 00 10 00 08 00 10 a0 2e 57 ...............W   
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ?????????????????
       0040: aa 00 00 aa 00 cf 8d 65 00 60 97 57 63 07 08 00 .......e.`.Wc... 
             ????? ================= =================
                       To: bones       From: pfortin     IP
       0050: 45 00 00 9c 74 8d 00 00 40 11 77 a9 c0 a8 86 65 E...t...@.w....e  
             VL tos len #29837 fl/frgTTLudp cksm  S=pfortin

       0060: c0 a8 86 64 03 1f 08 01 00 88 c2 81 d8 9a 43 fd ...d..........C. 
              D=bones    ----------------------- ===========

       0070: 00 00 00 00 00 00 00 02 00 01 86 a3 00 00 00 02 ................  

       0080: 00 00 00 06 00 00 00 01 00 00 00 2c 00 23 95 1b ...........,.#..   

       0090: 00 00 00 14 70 66 6f 72 74 69 6e 2e             ....pfortin.       

The entrails indicate that the above packet did come from pfortin intended for
bones as the ack packet for the last nfs block.

There is no logic between the file sizes and the failures.  Nor is there any
apparent reason for why these particular files are always bad.  I can FTP the
files just fine...


Here is the packet with seq#29837 sent MUCH earlier...

Note that it contains the same data as the corrupted one above...  have a look
at the checksum above and below...

Packet: 002317/003096   Time: 19:40:04.496   Level: BYTES/ETHER/IP/UDP

| OSI-Level 1: Byte Level                                    Packet size:  170 |
    Packet size: 0x00aa                                                         
     Time stamp: 0x04386410                                                     
   Network Type: 1 Ethernet/802.3                                               
| OSI-Level 2: Ethernet                                      Packet size:  170 |
      Source: 00:60:97:57:63:07  pfortin          Vendor: unknown               
 Destination: 00:aa:00:cf:8d:65  bones            Vendor: Intel                 
| OSI-Level 3: IP (Internet Protocol)                        Packet size:  156 |
 Type Of Service: precedence  = Routine      Datagram ID: 29837                 
                  delay       = normal  IP Control Flags: don't fragment = no   
                  throughput  = normal                    more fragment  = no   
                  reliability = normal          Checksum: 0x77a9                
 Fragment offset: 0                         Time-To-Live: 64 hops               
     Protocol ID: 17 UDP                    Total length: 156                   
          Source: pfortin                                       
     Destination: bones                                         
         Options: [no options]                                                  
| OSI-Level 4: UDP (User Datagram Protocol)                  Packet size:  136 |
         Source Port:   799               Message Length: 136                   
    Destination Port:  2049  nfs                Checksum: 0xc281                
 Data: 0000: d8 9a 43 fd 00 00 00 00 00 00 00 02 00 01 86 a3 ..C.............   
       0010: 00 00 00 02 00 00 00 06 00 00 00 01 00 00 00 2c ...............,   
       0020: 00 23 95 1b 00 00 00 14 70 66 6f 72 74 69 6e 2e .#......pfortin.   
       0030: 72 65 6d 69 6e 65 72 2e 68 6f 6d 65 00 00 01 f4 reminer.home....   
       0040: 00 00 01 f5 00 00 00 01 00 00 01 f5 00 00 00 00 ................   
       0050: 00 00 00 00 ca ba eb fe 9c 48 10 00 91 48 10 00 .........H...H..   
       0060: 46 03 00 00 46 03 00 00 01 50 00 00 43 69 66 a8 F...F....P..Cif.   
       0070: 00 00 00 00 00 20 b0 00 00 00 10 00 00 00 10 00 ..... ..........

Reply via email to