We've had that happen on some of our servers. Currently using the disable_msi workaround, which seems to have stopped it. I believe there's supposed to be a fix in the latest Red Hat kernel but we haven't really tested that yet.

You loose all network connectivity (including IPMI) to the server - not all connectivity, so e.g. serial console (not SOL, proper serial console, or using a console server) still works (as would a locally attached keyboard/monitor). Unless you require network to log in :) . If one runs into this, it's a really weird one (before you find the bug report) - to all appearances, the server works happily, no strangeness in the logs - just network gone completely.

It's not one to trigger easily - hard to track down sort of thing. Had 610s and 710s for a while before this first happened (and loads we never saw it on, still). We first saw it on a rather heavily used NFS server (i.e. lots of network I/O).

Tina


Cris Rhea wrote:
In case it helps anyone using Dell R410 / 610 / 710 etc. servers: I have had
machines lose their eth connections periodically (CentOS 5.4 bnx2 driver).
Seems like a bug with the Broadcom NIC drivers. [luckily read of it on a
Dell mailing list]

Bug Reports:

http://kbase.redhat.com/faq/docs/DOC-26837
http://patchwork.ozlabs.org/patch/51106

Not sure yet if this is exactly my issue but I'm giving it a shot now.
Thought I'd post since, anecdotally I've seen many people use these servers
on the list.

--
Rahul

I've been following this on the Dell list as I have approx. 50 R410s in our cluster.

One thing that isn't clear-- When this happens, do you lose all connectivity to the node (i.e., do you have to reboot the node to re-establish eth0)?

My R410s are running CentOS 5.2 - 5.4 and I rarely have one go down.

--- Cris




--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
_______________________________________________
Beowulf mailing list, [email protected] sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to