Greetings,

   Running Rocks 4.1 on a 30 node system and seeing serious RX packet
loss, drops and overruns while running heavy MPI i/o over e1000. I have
replaced cabling, and switches, updated e1000 drivers, ran multiple kernels, etc. No modifications seem to affect the issue. I am pursuing a hardware resolution with Intel and Supermicro but I am posting here in case someone has seen similar events.

   System details:
      30 nodes - Intel Pentium-D 840, 4GB RAM, 80GB SATA
            Supermicro PDSMI motherboard
            Intel 82573E and 82573L gigabit ethernet controllers
            (only one network connected)
            2.6.9-34.ELsmp  /*and*/   2.6.16.11
            e1000-7.0.38-1 driver

   Run details:
      mpirun -nolocal -np 18 -machinefile /home/test/machines.20-29
/home/test/IMB-MPI1 Alltoall -npmin 18 -msglen /home/test/Lengths
(msglen values of 32, 256, 512 and 1024 have been run exclusively, each resulting in packet drops)

  Packet drop example: (other nodes post similar numbers)
          RX packets:1843133 errors:0 dropped:1245 overruns:0 frame:0
          TX packets:1764828 errors:0 dropped:0 overruns:0 carrier:0

   I have tried increasing the e1000 RxDescriptors value to the maximum
of 4096 thinking that the Alltoall test may be overtasking receive
buffer resources but the drops still occur.

At Intel's advice I set arp filtering but it did nothing to change the behavior of the problem. (/proc/sys/net/ipv4/conf/all/arp_filter)

Any ideas?

--Jeff






_______________________________________________
Beowulf mailing list, [email protected]
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to