[EXTERNAL EMAIL] Onno, I'm not sure how Dell allows you to configure the server at order time, but Dell often have configuration limitations that do not seem immediately obvious. As a concrete example, if one tries to configure a server with too much RAM or too many disks you may receive a warning that you have to upgrade the PSU in the server to complete your order.

I could imagine that, as another possible case, Dell may allow you to install rear disks but it's possible they limit these disks to 7.2k (or disks rated to run at higher temperature, draw less power, etc). If you are installing disks yourself, you may not be aware of some of these limitations (as they may not be documented clearly or publicly). Every disk manufacturer sets their own temperature thresholds so while one drive may support 60 C, another may top out at 50 C. This isn't to say that either disk will be reliable if kept under those temperatures, just that those are the manufacturer's recommended operating temperatures and that they vary from model to model. If you're having high failure rates on rear disks, and the only obvious difference between front and rear disks is the operating temperature, I think that's a strong indicator that temperature could be a factor. Going forward you might consider using SSDs (which often produce less heat), lower rpm disks (that produce less heat), or disks rated for higher temperature extremes to see if there is a reliability improvement.

Onno Zweers wrote on 10/11/2019 2:43 AM:
Following up.

I checked two classes of servers:
R730xd - rear disk 58°C
R740xd2 - rear disk 33°C

That's a huge difference. The fan speeds were similar, between 10,300 and 
11,160 rpm. I don't think this accounts for the difference in temperature. 
Perhaps the airflow of the system has been improved in the R740. But there is 
one significant difference: in the R740xd2, the rear disks are SSDs, where the 
R730xd have spinning disks.

Cheers,
Onno

Op 10 okt. 2019, om 20:19 heeft Onno Zweers <[email protected]> het 
volgende geschreven:

Thanks everyone for the very useful answers. I had a quick look:

[root@shark5 ~]# for disk in $(smartctl --scan | egrep -o megaraid,[0-9]+) ; do echo -n 
"$disk - " ; smartctl -a /dev/sdb -d $disk | grep 'Current Drive Temperature' ; 
done
megaraid,0 - Current Drive Temperature:     31 C
megaraid,1 - Current Drive Temperature:     32 C
megaraid,2 - Current Drive Temperature:     32 C
megaraid,3 - Current Drive Temperature:     31 C
megaraid,4 - Current Drive Temperature:     32 C
megaraid,5 - Current Drive Temperature:     30 C
megaraid,6 - Current Drive Temperature:     32 C
megaraid,7 - Current Drive Temperature:     32 C
megaraid,8 - Current Drive Temperature:     31 C
megaraid,9 - Current Drive Temperature:     32 C
megaraid,10 - Current Drive Temperature:     34 C
megaraid,11 - Current Drive Temperature:     32 C
megaraid,12 - Current Drive Temperature:     56 C
megaraid,13 - Current Drive Temperature:     58 C
megaraid,14 - Current Drive Temperature:     44 C
megaraid,15 - Current Drive Temperature:     45 C
megaraid,16 - Current Drive Temperature:     47 C
megaraid,17 - Current Drive Temperature:     51 C

58 degrees C seems very hot to me, and indeed disks 12 and 13 are in the back 
of the machine. We have lots of these servers and we've noticed that these rear 
disks fail rather often. The 2 disks in the rear have as many failures as the 
12 disks in front. I guess the next step would be to check at which speed the 
fans blowing.

Cheers,
Onno

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

_______________________________________________
Linux-PowerEdge mailing list
[email protected]
https://lists.us.dell.com/mailman/listinfo/linux-poweredge

Reply via email to