On 05/23/15 20:52, Zhu Sha Zang wrote:
On 05/23/2015 06:53 PM, Joseph wrote:
On 05/23/15 18:08, Zhu Sha Zang wrote:
On 05/23/2015 05:24 PM, Joseph wrote:
I have a box in a remote location (8-core CPU) and it turn itself off
during compiling

The box it connected to UPS.  Is it power supply?


Maybe. I have a problem like that when using high processing simulation
with nvidia-cuda and the power supply protection was unable to keep a
safe energy level then the system goes off.

But, if the failure happens during compilation time can be a heat
problem. Install lm_sensors and use something like that: "watch -n 1
sensors".

If not, if the temperature stay at safe levels, maybe you have a RAM
corruption. In this case, you'll need to use memtest86++ to check.

Good Luck

I tried to read the lm-sensors again and the compupter turn crash with
the readings:

fan1:           0 RPM  (min =   10 RPM)  ALARM
fan2:           0 RPM  (min =    0 RPM)
fan3:           0 RPM  (min =    0 RPM)
fan5:           0 RPM  (min =    0 RPM)
temp1:        +47.0°C  (low  = +127.0°C, high = +127.0°C)  sensor =
thermistor
temp2:       +106.0°C  (low  = +127.0°C, high = +70.0°C)  sensor =
thermal diode
temp3:       +106.0°C  (low  = +127.0°C, high = +127.0°C)  sensor =
thermistor
cpu0_vid:    +1.250 V

I'm suspecting it is power supply.


Hey, did you run "sensors-detect" and "/etc/init.d/lm_sensors" as root
before use "sensors"?

As was said, maybe you're using wrong kernel modules.

I went to pickup the remote box and look at it; the CPU fan stop working. The CPU heat sink is big so in idle mode it could keep up with cooling it but under heavy load "compiling anything" the CPU was overheating.

--
Joseph

Reply via email to