[jira] [Commented] (MESOS-5376) Add systemd watchdog support

Lawrence Wu (JIRA) Mon, 11 Jul 2016 11:12:54 -0700

    [ 
https://issues.apache.org/jira/browse/MESOS-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371320#comment-15371320
 ]


Lawrence Wu commented on MESOS-5376:
------------------------------------

The basic, high-level plan is:
- check that the WATCHDOG_USEC environment variable is set
- if it is set, spin off a thread and hit the watchdog service every half of 
WATCHDOG_USEC interval

I'll start digging into the code now.

> Add systemd watchdog support
> ----------------------------
>
>                 Key: MESOS-5376
>                 URL: https://issues.apache.org/jira/browse/MESOS-5376
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: David Robinson
>            Assignee: Lawrence Wu
>
> It would be great if Mesos had support for systemd's 
> [watchdog|http://0pointer.de/blog/projects/watchdog.html]. Users would 
> typically use a supervisor like [monit|https://mmonit.com/monit/] to check 
> the agent/master's /health endpoint and restart upon consecutive failures. 
> Systemd doesn't support polling services, it uses a watchdog to communicate 
> liveliness instead. Supervisor solutions like monit could be replaced with 
> systemd if mesos had watchdog support. Note that simply restarting the 
> service upon failure (ie, when the process exits) is not sufficient -- a 
> deadlock within mesos would not cause the process to exit but a watchdog 
> could detect this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (MESOS-5376) Add systemd watchdog support

Reply via email to