What's got me and the IT guys stumped is that while the compute
nodes
boot via PXE from the head node without trouble on the NetGear, they
barf with the SMC. To be specific, after the initial boot with a
minimal Linux kernel, there is a "fatal error" with "timeout
waiting for
getfile" when the compute node attempts to download the provisioning
image from head. However, when they were running Rocks before I
arrived, the cluster worked fine with the SMC switch.
This is very common with Spanning tree enabled. Essentially, once the
port has a physical link light it may take a while before spanning
tree allows traffic to actually flow through the port. Longer than a
typical timeout. When loading/reloading the driver there seems to be
an instantaneous drop of the link that forces a new delay cycle.
With the Dell PowerConnect (SMC Rebrand??) series you have to "enable"
port fast or "disable" spanning tree to avoid this delay before
traffic passes. I generally do both. The Web based GUI is
sufficiently bad enough to make this more difficult than it needs to
be, but you can globally disable spanning tree through it. I use the
command line, connect to interface range all, and then configure my
ports as:
!
enable
config
interface range ethernet all
spanning-tree disable
spanning-tree portfast
mtu 9216
exit
!
Hope this helps!
Cheers!
Greg
Technical Principal
R Systems NA, inc.
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf