I don't know much about that to be honest.  I don't think I would worry 
too much about it.  All of our rack mounts appear to have the same 
behavior.  They are above the "Lower critical", however.

I think at this point these are the steps I would try:
* Reset the iDRAC/BMC. Sometimes this fixes issues. Either "racadm 
racreset" over ssh to idrac or "ipmitool mc reset cold" from the OS. 
Pulling all the power plugs will do the same thing
* Reseat the power supplies if possible since one appears to be spinning 
really fast.  It should be easy to hear which one is the fast one
* Check thumb screws on heat sinks
* Check thermal paste on the CPU to see if it was poorly distributed. 
Whatever the result is, you will need to reapply thermal paste once you 
check it.

On 12/08/2010 04:42 PM, Erich Weiler wrote:
> Hmm...
>
> [r...@server ~]# ipmitool -v sdr type Fan
> Sensor ID              : FAN 1 RPM (0x30)
>  Entity ID             : 7.1 (System Board)
>  Sensor Type (Analog)  : Fan
>  Sensor Reading        : 1320 (+/- 120) RPM
>  Status                : ok
>  Nominal Reading       : 10080.000
>  Normal Minimum        : 16680.000
>  Normal Maximum        : 23640.000
>  Lower critical        : 720.000
>  Positive Hysteresis   : 600.000
>  Negative Hysteresis   : 600.000
>  Minimum sensor range  : Unspecified
>  Maximum sensor range  : Unspecified
>  Event Message Control : Per-threshold
>  Readable Thresholds   : lcr
>  Settable Thresholds   :
>  Threshold Read Mask   : lcr
>  Assertion Events      :
>  Assertions Enabled    : lcr-
>  Deassertions Enabled  : lcr-
> ...clip...
>
> Does that mean that my speed of 1320 RPM is below the 'normal minimum' 
> of 16680.000 RPM?  That's quite a difference, if so...  It shows that 
> for almost all of the fans.
>
> On 12/08/10 14:37, Ryan Cox wrote:
>> That is interesting.  That appears to be showing two sets of fans 
>> (the 7.1s and 10.*).  Do you have 4 power supplies in those?  I don't 
>> know off the top of my head how many PSUs an R910 takes.  10.* in 
>> ipmitool is for power supplies and 7 is for the system board (see 
>> "ipmitool sdr entity help").  Maybe reseat them one at a time if you 
>> have enough power?
>>
>> You could try the ipmitool command with "-v" to see more information
>>
>> Erich Weiler wrote:
>>> Very useful:
>>>
>>> [r...@server ~]# ipmitool sdr type Fan
>>> FAN 1 RPM        | 30h | ok  |  7.1 | 1320 RPM
>>> FAN 2 RPM        | 31h | ok  |  7.1 | 1320 RPM
>>> FAN 3 RPM        | 32h | ok  |  7.1 | 1440 RPM
>>> FAN 4 RPM        | 33h | ok  |  7.1 | 1680 RPM
>>> FAN 5 RPM        | 34h | ok  |  7.1 | 1560 RPM
>>> FAN 6 RPM        | 35h | ok  |  7.1 | 1680 RPM
>>> Fan RPM          | 36h | ok  | 10.1 | 3480 RPM
>>> Fan RPM          | 37h | ok  | 10.2 | 10080 RPM
>>> Fan RPM          | 38h | ok  | 10.3 | 3120 RPM
>>> Fan RPM          | 39h | ok  | 10.4 | 2160 RPM
>>> Fan Redundancy   | 75h | ok  |  7.1 | Fully Redundant
>>>
>>> I wonder why one fan is so fast while the others are slower.  I'm 
>>> beginning to think the BIOS might be the next step, to check Fan 
>>> speed options...
>>>
>>> On 12/08/10 13:55, Ryan Cox wrote:
>>>> We don't use OMSA here but do use ipmitool extensively.  This may 
>>>> get you what you need.
>>>>
>>>> Load the following kernel modules first:  ipmi_si, ipmi_devintf, 
>>>> ipmi_msghandler
>>>> Give it a few seconds and then run:
>>>> ipmitool sdr type Fan
>>>>
>>>> It can also be run remotely against an iDRAC (or BMC).
>>>>
>>>> We have had thermal issues before and it was almost always the 
>>>> result of thumbscrews that weren't in all the way.  There 
>>>> definitely could be a different issue though.
>>>>
>>>> Ryan
>>>>
>>>> On 12/08/2010 02:49 PM, Erich Weiler wrote:
>>>>> Yeah, I tried OMSA, but for the life of me could not get it to 
>>>>> read anything from the IPMI/BIOS interfaces.  No idea why.  I may 
>>>>> just have to reboot and go into the BIOS manually and see what I 
>>>>> can see there.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On 12/08/10 13:49, Bond Masuda wrote:
>>>>>> Have you checked the fan speeds? Are they at full throttle? I 
>>>>>> don't know
>>>>>> much about the R910, but usually you can get fan speed readings 
>>>>>> from OMSA.
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: [email protected]
>>>>>> [mailto:[email protected]] On Behalf Of Erich Weiler
>>>>>> Sent: Wednesday, December 08, 2010 1:40 PM
>>>>>> To: Ryan Cox
>>>>>> Cc: [email protected]
>>>>>> Subject: Re: R910/Linux CPU Heat Problems?
>>>>>>
>>>>>>> Just so you know, the kernel is merely responding to interrupts 
>>>>>>> from the processor cores themselves saying they are over 
>>>>>>> temperature.  The cores have their thresholds set and the kernel 
>>>>>>> can't and doesn't mess with them.  If the kernel reports the 
>>>>>>> processors are hot, the processors are actually hot.
>>>>>>
>>>>>> Ah, good to know.  It may be that the air is simply not cool 
>>>>>> enough in the datacenter, but this would be the first time I've 
>>>>>> ever seen this with any of our servers.  I'll double check the 
>>>>>> screws and fans and see if that might be an issue...
>>>>>>
>>>>>> _______________________________________________
>>>>>> Linux-PowerEdge mailing list
>>>>>> [email protected]
>>>>>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>>>>>> Please read the FAQ at http://lists.us.dell.com/faq
>>>>>>
>>>>
>>

-- 
Ryan Cox
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to