Hi all,

Thanks for your responses!  I finally fixed this yesterday afternoon but 
neglected to update my post, my apologies.
  
After discussing our problem to the Penguin Computing service rep, I 
reconfigured the switch to enable fast spanning-tree mode for compute node 
ports.  That apparently fixed the problem and thanks to your feedback I am 
starting to understand why.

Thanks again,
- Art.

On Dec 2, 2009, at 10:30 AM, Joe Landman wrote:

> Art Poon wrote:
>> Dear colleagues,
> 
> [...]
> 
>> What's got me and the IT guys stumped is that while the compute nodes
>> boot via PXE from the head node without trouble on the NetGear, they
>> barf with the SMC.  To be specific, after the initial boot with a
>> minimal Linux kernel, there is a "fatal error" with "timeout waiting
>> for getfile" when the compute node attempts to download the
>> provisioning image from head.  However, when they were running Rocks
>> before I arrived, the cluster worked fine with the SMC switch.
> 
> Is it the switch of the dhcp/bootp/tftp setup thats the problem?  Are you 
> sure the tftp daemon is up, or bootp is configured correctly?
> 
> Switches sometimes have broadcast storm suppression turned on, or worse, 
> sometimes they have spanning tree turned on.  You want the switch to be as 
> dumb as you can possibly make it for most linux clusters.  Fast, but dumb.
> 
>> I've tried resetting the SMC switch to factory defaults (with
>> auto-negotiate on).  I've checked the /etc/beowulf/modprobe.conf and
>> it doesn't seem to be demanding anything exotic.  We've tried
>> swapping out to another SMC switch but that didn't change anything.
> 
> This sounds more on the server software stack than the switch.  Could you 
> describe this?  Are you using Scyld/Rocks for that?
> 
> Rocks is quite sensitive to configuration issues, and really doesn't like 
> altered configurations (it is possible to do, though non-trivial).
> 
> -- 
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics Inc.
> email: [email protected]
> web  : http://scalableinformatics.com
>       http://scalableinformatics.com/jackrabbit
> phone: +1 734 786 8423 x121
> fax  : +1 866 888 3112
> cell : +1 734 612 4615


_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to