What's got me and the IT guys stumped is that while the compute nodes
boot via PXE from the head node without trouble on the NetGear, they
barf with the SMC.  To be specific, after the initial boot with a
minimal Linux kernel, there is a "fatal error" with "timeout waiting for
getfile" when the compute node attempts to download the provisioning
image from head.  However, when they were running Rocks before I
arrived, the cluster worked fine with the SMC switch.


This is very common with Spanning tree enabled. Essentially, once the port has a physical link light it may take a while before spanning tree allows traffic to actually flow through the port. Longer than a typical timeout. When loading/reloading the driver there seems to be an instantaneous drop of the link that forces a new delay cycle.

With the Dell PowerConnect (SMC Rebrand??) series you have to "enable" port fast or "disable" spanning tree to avoid this delay before traffic passes. I generally do both. The Web based GUI is sufficiently bad enough to make this more difficult than it needs to be, but you can globally disable spanning tree through it. I use the command line, connect to interface range all, and then configure my ports as:

!
enable
config
interface range ethernet all
spanning-tree disable
spanning-tree portfast
mtu 9216
exit
!

Hope this helps!

Cheers!
Greg

Technical Principal
R Systems NA, inc.





_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to