On Fri, Oct 17, 2008 at 10:22 AM, Nifty niftyompi Mitch <[EMAIL PROTECTED]> wrote:
> Check the baseboard management controller log (Ctrl+E). > > Tell us what software distribution you are running and any changes that might > have > been made (no matter how small). What is the default run level (is X11 > active/ not active). > Are power saving options enabled in the BIOS? Distro: Centos 5.2. Linux node03 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux No changes made to standard kernel. X11 not active. Power saving not enabled. > Also what hardware monitor software are you running. I have seen system > admins add > their own package to systems only to find that RHEL has an equivalent package > that uses different device drivers for the same hardware with impossible to > diagnose > results. Custom kernel? I am not sure what you mean by "hardware monitor software". I do not recall installing anything special. > Disable cpuspeed, hardware monitor and hardware control software to see if > stability changes. There are a bunch of Dell utilities that come up at boot-time. BMC, RAID, Bradcom-PXE, Remote manage controllers. You want me to disable those? Stability has already changed. After I swapped motherboard+cpu. No more dead nodes in over 2 weeks now (yay!) But I just want to make sure this won't be a recurring problem with these SC1435's before we go in for our next expansion. > What additional hardware is in the chassis? None other than what came with the original Dell units. These are only 2 months old now. They do have dual NICs and no CDROMs. Have disks. Linked to a Dell KVM via a SIP module. No min-n-matching of Hardware. Was a monolithic Dell order. > The "poweredge indicator turning orange" tells me that the problem was > detected by the > system and there should be a hint in the log. The orange state is sticky and > needs to be cleared.... Funny. It wasn't sticky for me. When I rebooted the orange light cleared. I did not need to reset it via the BIOS. Unfortunately the SC series does not have the tiny LCD for an error display. -- Rahul _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf