Suggestion :

 ... upgrade the cooling capacity.

The CPU in my box is a AMD FX-9590. TDP is 220 watts. Running at 4.7 Ghz.

With cooling for TDP 250 watts, it ran hot under load.

With cooling for TDP 900 watts, it rarely gets close to 110 F under
heavy load.


On 04/20/2018 09:11 AM, R0b0t1 wrote:
> On Fri, Apr 20, 2018 at 7:21 AM, Mick <michaelkintz...@gmail.com> wrote:
>> On Friday, 20 April 2018 12:55:13 BST Corbin Bird wrote:
>>> Oak Ridge National Laboratory uses these processors ( Rhea Cluster ) and
>>> has numerous heat failures.
>>>
>>> Due to poor cooling ... surprised?
>>>
>>> The cooling is not working right. Something is still wrong.
>>>
>>> On 04/19/2018 09:33 PM, R0b0t1 wrote:
>>>> Dell Precision T7600, two 16 thread Xeons, 192GB of RAM, two Quadro
>>>> cards and a Tesla card.
>>>>
>>>> The system is a few years old at this point. Old enough that the
>>>> thermal compound could have hardened, which is why I replaced it.
>> If the problem started suddenly, rather than getting progressively worse over
>> time, it may have something to do with kernel drivers, or some change in
>> firmware.
>>
> As far as I know it has always been like this. It may be why it was
> hardly used before it came into my care. Looking at the server I could
> blame poor design; the inside is rather cramped, despite the care
> taken with the internal baffles. They may not have run a good flow
> simulation.
>
> Mr. Bird's observation seems to support this.
>
>> If the cause is mechanical, I'd also suggest checking the heat sink contact
>> surface.  Some heat sinks are poorly manufactured and require flattening with
>> wet 'n dry sandpaper to get a flat enough surface and improve their contact
>> with the CPU.  I've seen 15°C improvement in a Zalman CPU cooler after excess
>> metal was removed from copper pipes, which were manufactured proud.  Hardcore
>> O/C's flatten the CPU too, but I'd avoid anything as radical because it can 
>> go
>> badly wrong if you remove more than the surface varnish from the chip.
>>
>> In the interim, opening the side panel may also help in hot weather.
>>
> The internals are custom made to fit the motherboard, cards, and drive
> slots. It may work better if I move it to another tower but it will be
> a while before I can find one. I will look at the interface between
> the heatsink and processor again, but it looked fine.
>
>
> How concerned should I be about overheating machine check errors? I
> used to think that it was best to avoid them, as the threshold was
> high enough that very small parts of the die could overshoot and fail,
> but I was informed that is not the case. Besides the throttling (which
> is fairly bad) I am not sure if there are any drawbacks to the
> overheating.
>
> I am wondering what the point of 32 threads is if you can't use them at 100%.
>
> Cheers,
>      R0b0t1


Reply via email to