I guess all this engineering isn't working for you?:

http://www.youtube.com/watch?v=oulHU7hGRDM



-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Erich Weiler
Sent: Wednesday, December 08, 2010 1:28 PM
To: Ryan Cox
Cc: [email protected]
Subject: Re: R910/Linux CPU Heat Problems?

Thanks Ryan!

So far so good.  The server is under load now, as I run the script:

[r...@server msr-tools-1.2]# ./showtemps | sort
  0: 67
  1: 70
  2: 63
  3: 77
  4: 65
  5: 71
  6: 65
  7: 77
  8: 67
  9: 72
10: 64
11: 77
12: 66
13: 73
14: 65
15: 76
16: 69
17: 76
18: 64
19: 78
20: 70
21: 75
22: 66
23: 77
24: 69
25: 79
26: 66
27: 81
28: 68
29: 75
30: 70
31: 76
32: 67
33: 70
34: 63
35: 77
36: 67
37: 71
38: 65
39: 78
40: 66
41: 72
42: 63
43: 78
44: 64
45: 71
46: 64
47: 76
48: 70
49: 75
50: 62
51: 78
52: 70
53: 75
54: 64
55: 77
56: 67
57: 79
58: 65
59: 82
60: 68
61: 75
62: 65
63: 76

So, >70C seems kind of hot, but I'm not sure why it would be happening? 
  Aren't these servers supposed to be able to handle high load? 
According to the Intel Documentation on the 7500 Series processors, the 
max recommended temp at 130W should be about 69C (note page 115):

http://www.intel.com/Assets/en_US/PDF/datasheet/323340.pdf

So I'm definitely exceeding that on some of those CPUs...

Maybe it's possible I don't have the fans spinning fast enough as 
configured in the BIOS?  Although I haven't tweaked the BIOS, it's 
mostly factory default...

Thanks again!

On 12/08/10 13:09, Ryan Cox wrote:
> Try running the following code. Load the "msr" kernel module and me sure 
> rdmsr is installed.  It's available from 
> http://www.kernel.org/pub/linux/utils/cpu/msr-tools/ and is simple to 
> compile.
> for a in /dev/cpu/[0-9]*
> do
>     cpu=$(basename $a)
>     printf "%2d: " $cpu
>     echo $(($(rdmsr -f 23:16 -p$cpu -d 0x1a2) - $(rdmsr -f 22:16 -p$cpu 
> -u 0x19c)))
> done
> 
> That should return the core temperatures in Celsius by reading the 
> values from the CPU MSRs.  I may have some other ideas for you if what 
> that reveals doesn't help.
> 
> Ryan
> 
> On 12/08/2010 02:00 PM, Erich Weiler wrote:
>> Hi All,
>>
>> We're running CentOS 5.5 (kernel 2.6.18-194.3.1.el5) on two Dell R910
>> servers.  We're periodically getting CPU overheating messages spit out
>> from syslogd:
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU60: Temperature above threshold, cpu clock throttled
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU28: Temperature above threshold, cpu clock throttled
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU24: Temperature/speed normal
>>
>> Message from syslogd@ at Fri Dec  3 12:06:56 2010 ...
>> server kernel: CPU32: Temperature/speed normal
>>
>> The servers are well ventilated in a datacenter, and they both exhibit
>> the same problem when under load.  I think the fans are working OK, but
>> maybe these CPUs just run a little hotter than others, which may be
>> triggering the threshold in the kernel?   Anyone else seen this before?
>>
>> lm_sensors doesn't work on these boxes.  Any info on why it's happening,
>> or a good way to query the CPU temps, would be much appreciated!
>>
>> TIA!
>>
>> _______________________________________________
>> Linux-PowerEdge mailing list
>> [email protected]
>> https://lists.us.dell.com/mailman/listinfo/linux-poweredge
>> Please read the FAQ at http://lists.us.dell.com/faq
> 

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to