I don't know much about that to be honest. I don't think I would worry too much about it. All of our rack mounts appear to have the same behavior. They are above the "Lower critical", however.
I think at this point these are the steps I would try: * Reset the iDRAC/BMC. Sometimes this fixes issues. Either "racadm racreset" over ssh to idrac or "ipmitool mc reset cold" from the OS. Pulling all the power plugs will do the same thing * Reseat the power supplies if possible since one appears to be spinning really fast. It should be easy to hear which one is the fast one * Check thumb screws on heat sinks * Check thermal paste on the CPU to see if it was poorly distributed. Whatever the result is, you will need to reapply thermal paste once you check it. On 12/08/2010 04:42 PM, Erich Weiler wrote: > Hmm... > > [r...@server ~]# ipmitool -v sdr type Fan > Sensor ID : FAN 1 RPM (0x30) > Entity ID : 7.1 (System Board) > Sensor Type (Analog) : Fan > Sensor Reading : 1320 (+/- 120) RPM > Status : ok > Nominal Reading : 10080.000 > Normal Minimum : 16680.000 > Normal Maximum : 23640.000 > Lower critical : 720.000 > Positive Hysteresis : 600.000 > Negative Hysteresis : 600.000 > Minimum sensor range : Unspecified > Maximum sensor range : Unspecified > Event Message Control : Per-threshold > Readable Thresholds : lcr > Settable Thresholds : > Threshold Read Mask : lcr > Assertion Events : > Assertions Enabled : lcr- > Deassertions Enabled : lcr- > ...clip... > > Does that mean that my speed of 1320 RPM is below the 'normal minimum' > of 16680.000 RPM? That's quite a difference, if so... It shows that > for almost all of the fans. > > On 12/08/10 14:37, Ryan Cox wrote: >> That is interesting. That appears to be showing two sets of fans >> (the 7.1s and 10.*). Do you have 4 power supplies in those? I don't >> know off the top of my head how many PSUs an R910 takes. 10.* in >> ipmitool is for power supplies and 7 is for the system board (see >> "ipmitool sdr entity help"). Maybe reseat them one at a time if you >> have enough power? >> >> You could try the ipmitool command with "-v" to see more information >> >> Erich Weiler wrote: >>> Very useful: >>> >>> [r...@server ~]# ipmitool sdr type Fan >>> FAN 1 RPM | 30h | ok | 7.1 | 1320 RPM >>> FAN 2 RPM | 31h | ok | 7.1 | 1320 RPM >>> FAN 3 RPM | 32h | ok | 7.1 | 1440 RPM >>> FAN 4 RPM | 33h | ok | 7.1 | 1680 RPM >>> FAN 5 RPM | 34h | ok | 7.1 | 1560 RPM >>> FAN 6 RPM | 35h | ok | 7.1 | 1680 RPM >>> Fan RPM | 36h | ok | 10.1 | 3480 RPM >>> Fan RPM | 37h | ok | 10.2 | 10080 RPM >>> Fan RPM | 38h | ok | 10.3 | 3120 RPM >>> Fan RPM | 39h | ok | 10.4 | 2160 RPM >>> Fan Redundancy | 75h | ok | 7.1 | Fully Redundant >>> >>> I wonder why one fan is so fast while the others are slower. I'm >>> beginning to think the BIOS might be the next step, to check Fan >>> speed options... >>> >>> On 12/08/10 13:55, Ryan Cox wrote: >>>> We don't use OMSA here but do use ipmitool extensively. This may >>>> get you what you need. >>>> >>>> Load the following kernel modules first: ipmi_si, ipmi_devintf, >>>> ipmi_msghandler >>>> Give it a few seconds and then run: >>>> ipmitool sdr type Fan >>>> >>>> It can also be run remotely against an iDRAC (or BMC). >>>> >>>> We have had thermal issues before and it was almost always the >>>> result of thumbscrews that weren't in all the way. There >>>> definitely could be a different issue though. >>>> >>>> Ryan >>>> >>>> On 12/08/2010 02:49 PM, Erich Weiler wrote: >>>>> Yeah, I tried OMSA, but for the life of me could not get it to >>>>> read anything from the IPMI/BIOS interfaces. No idea why. I may >>>>> just have to reboot and go into the BIOS manually and see what I >>>>> can see there. >>>>> >>>>> Thanks! >>>>> >>>>> On 12/08/10 13:49, Bond Masuda wrote: >>>>>> Have you checked the fan speeds? Are they at full throttle? I >>>>>> don't know >>>>>> much about the R910, but usually you can get fan speed readings >>>>>> from OMSA. >>>>>> >>>>>> -----Original Message----- >>>>>> From: [email protected] >>>>>> [mailto:[email protected]] On Behalf Of Erich Weiler >>>>>> Sent: Wednesday, December 08, 2010 1:40 PM >>>>>> To: Ryan Cox >>>>>> Cc: [email protected] >>>>>> Subject: Re: R910/Linux CPU Heat Problems? >>>>>> >>>>>>> Just so you know, the kernel is merely responding to interrupts >>>>>>> from the processor cores themselves saying they are over >>>>>>> temperature. The cores have their thresholds set and the kernel >>>>>>> can't and doesn't mess with them. If the kernel reports the >>>>>>> processors are hot, the processors are actually hot. >>>>>> >>>>>> Ah, good to know. It may be that the air is simply not cool >>>>>> enough in the datacenter, but this would be the first time I've >>>>>> ever seen this with any of our servers. I'll double check the >>>>>> screws and fans and see if that might be an issue... >>>>>> >>>>>> _______________________________________________ >>>>>> Linux-PowerEdge mailing list >>>>>> [email protected] >>>>>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge >>>>>> Please read the FAQ at http://lists.us.dell.com/faq >>>>>> >>>> >> -- Ryan Cox Systems Administrator Fulton Supercomputing Lab Brigham Young University _______________________________________________ Linux-PowerEdge mailing list [email protected] https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq
