On 05/23/2015 06:53 PM, Joseph wrote:
On 05/23/15 18:08, Zhu Sha Zang wrote:
On 05/23/2015 05:24 PM, Joseph wrote:
I have a box in a remote location (8-core CPU) and it turn itself off
during compiling

The box it connected to UPS.  Is it power supply?


Maybe. I have a problem like that when using high processing simulation
with nvidia-cuda and the power supply protection was unable to keep a
safe energy level then the system goes off.

But, if the failure happens during compilation time can be a heat
problem. Install lm_sensors and use something like that: "watch -n 1
sensors".

If not, if the temperature stay at safe levels, maybe you have a RAM
corruption. In this case, you'll need to use memtest86++ to check.

Good Luck

I tried to read the lm-sensors again and the compupter turn crash with the readings:

fan1:           0 RPM  (min =   10 RPM)  ALARM
fan2:           0 RPM  (min =    0 RPM)
fan3:           0 RPM  (min =    0 RPM)
fan5:           0 RPM  (min =    0 RPM)
temp1: +47.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor temp2: +106.0°C (low = +127.0°C, high = +70.0°C) sensor = thermal diode temp3: +106.0°C (low = +127.0°C, high = +127.0°C) sensor = thermistor
cpu0_vid:    +1.250 V

I'm suspecting it is power supply.


Hey, did you run "sensors-detect" and "/etc/init.d/lm_sensors" as root before use "sensors"?

As was said, maybe you're using wrong kernel modules.

Regards

Reply via email to