> What's got me and the IT guys stumped is that while the compute nodes boot via PXE from the head node without trouble on the NetGear, they barf with the SMC. To be specific, after the initial boot with a minimal Linux kernel, there is a "fatal error" with "timeout waiting for getfile" when the compute node attempts to download the provisioning image from head. However, when they were running Rocks before I arrived, the cluster worked fine with the SMC switch.
Use tcpdump or some equivalent. Run it once with the dumb switch, once with the managed one, and then compare and contrast. > I've tried resetting the SMC switch to factory defaults (with auto-negotiate on). I've checked the /etc/beowulf/modprobe.conf and it doesn't seem to be demanding anything exotic. We've tried swapping out to another SMC switch but that didn't change anything. Detach from the world at large then turn off the firewall on the master. (Probably not it this time, but whenever there are network problems always rule out the firewall before spending time on anything else.) Ipv6 vs. Ipv4? By which I mean, once the kernel boots, perhaps it goes to ipv6, which the netgear handles properly, but maybe that is turned off on the SMC? Regards, David Mathog [email protected] Manager, Sequence Analysis Facility, Biology Division, Caltech _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
