Benjamin Mahler created MESOS-1572:
--------------------------------------
Summary: Consider automatic restarting of slaves/masters with
soft/hard cpu lockups.
Key: MESOS-1572
URL: https://issues.apache.org/jira/browse/MESOS-1572
Project: Mesos
Issue Type: Bug
Components: master, slave
Reporter: Benjamin Mahler
On Linux systems, when a soft or hard CPU lockup occurs on slaves, we have
observed strange and undesirable things occur possibly due to kernel bugs.
With root access and {{sysctl}}, it should be possible to configure an
automatic reboot of the machine when a soft/hard lockup occurs:
https://www.kernel.org/doc/Documentation/lockup-watchdogs.txt
--
This message was sent by Atlassian JIRA
(v6.2#6252)