A few additions:
I would highly recommend checking thumbscrews on the heat sink if the 
CPUs are legitimately hot.  Thermal paste distribution on the CPUs may 
cause issues too.

Also, "rdmsr -f 23:16  -d 0x1a2" will return the temperature threshold 
in degrees C.  If you hit that temperature the core will be throttled.  
I haven't tried this with hyperthreading so I don't know if you'll have 
"extra" results or not when querying all the threads.  I'm guessing both 
threads will return the temperature of the core.

Just so you know, the kernel is merely responding to interrupts from the 
processor cores themselves saying they are over temperature.  The cores 
have their thresholds set and the kernel can't and doesn't mess with 
them.  If the kernel reports the processors are hot, the processors are 
actually hot.

Ryan Cox

On 12/08/2010 02:09 PM, Ryan Cox wrote:
> Try running the following code. Load the "msr" kernel module and be sure
> rdmsr is installed.  It's available from
> http://www.kernel.org/pub/linux/utils/cpu/msr-tools/ and is simple to
> compile.
> for a in /dev/cpu/[0-9]*
> do
>       cpu=$(basename $a)
>       printf "%2d: " $cpu
>       echo $(($(rdmsr -f 23:16 -p$cpu -d 0x1a2) - $(rdmsr -f 22:16 -p$cpu
> -u 0x19c)))
> done
>
> That should return the core temperatures in Celsius by reading the
> values from the CPU MSRs.  I may have some other ideas for you if what
> that reveals doesn't help.
>
> Ryan
>
> On 12/08/2010 02:00 PM, Erich Weiler wrote:
>> Hi All,
>>
>> We're running CentOS 5.5 (kernel 2.6.18-194.3.1.el5) on two Dell R910
>> servers.  We're periodically getting CPU overheating messages spit out
>> from syslogd:
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU60: Temperature above threshold, cpu clock throttled
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU28: Temperature above threshold, cpu clock throttled
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU24: Temperature/speed normal
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU32: Temperature/speed normal
>>
>> The servers are well ventilated in a datacenter, and they both exhibit
>> the same problem when under load.  I think the fans are working OK, but
>> maybe these CPUs just run a little hotter than others, which may be
>> triggering the threshold in the kernel?   Anyone else seen this before?
>>
>> lm_sensors doesn't work on these boxes.  Any info on why it's happening,
>> or a good way to query the CPU temps, would be much appreciated!
>>
>> TIA!
>>
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> [email protected]
>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>> Please read the FAQ at http://lists.us.dell.com/faq

-- 
Ryan Cox
Systems Administrator
Fulton Supercomputing Lab
Brigham Young University

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to