On Mon, Apr 19, 2010 at 02:53:21PM -0500, [email protected] wrote: > Chris, > I've seen other HPC customers who have had thermal issues, > especially with systems in the top of the rack. If you have any > leakage of air from the hot aisle into the cold aisle, it would be > possible the inlet (ambient) temperature for a system could be higher > than you realize. I don't have a 1435 and don't have the specs in > front of me, but I would think 71F is within the operating range of the > system. If not, it is barely outside the operating range. I believe > the 1435 has a Baseboard Management Controller (BMC) that records > hardware events into the System Event Log (SEL). You should be able > to view the SEL during POST by pressing CTRL-E. You can also view the > SEL through IPMI Tool or OMSA. I would check the SEL for any > events, especially for thermal sensors. > > Wayne Weilnau > Systems Management Technologist > Dell | OpenManage Software Development
Systems are from bottom to top of rack... yes, our hot/cold aisle stuff is a bit sloppy (don't have under-floor cold air), but I figured I'd have a pattern as you suggest (e.g., systems at top of rack). Temp reading is low-tech thermometer on front door of rack at eye-level. The only place I see these errors is in the SEL. Upon powering the machines back up, I do the CTRL-E and look at the SEL. I get simple messages like "CPUx thermal tripped asserted". I've taken one system apart and re-done the thermal goo between the CPU/heatsink. Didn't help. Replaced the MB and it has behaved since then. Perhaps, if this isn't a common problem, I really do just have 8 more systems that have bad thermal sensors on the MB. -- Cris -- Cristopher J. Rhea Mayo Clinic - Research Computing Facility 200 First St SW, Rochester, MN 55905 [email protected] (507) 284-0587 _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
