Benjamin Mahler created MESOS-1572:
--------------------------------------

             Summary: Consider automatic restarting of slaves/masters with 
soft/hard cpu lockups.
                 Key: MESOS-1572
                 URL: https://issues.apache.org/jira/browse/MESOS-1572
             Project: Mesos
          Issue Type: Bug
          Components: master, slave
            Reporter: Benjamin Mahler


On Linux systems, when a soft or hard CPU lockup occurs on slaves, we have 
observed strange and undesirable things occur possibly due to kernel bugs.

With root access and {{sysctl}}, it should be possible to configure an 
automatic reboot of the machine when a soft/hard lockup occurs:
https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to