[ 
https://issues.apache.org/jira/browse/MESOS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Mahler updated MESOS-1648:
-----------------------------------
    Description: 
Right now we use a number of wrapper scripts to try and keep up a 
{{/var/run/mesos/mesos-slave.pid}} in order to be able to monitor the process.  
This has proven to be somewhat fragile due to the lack of locking and the 
possibility of races and stale data.

By adding a {{--pidfile}}, we can obtain a lock on the file to prevent multiple 
binaries from starting, and to enable the tooling to validate that the lock is 
held before doing any signaling. We can also do a best effort unlink in the 
signal handler upon termination:

{code}
// Get exclusive access to the file.
fd = open(O_CREAT ...)
flock(fd, LOCK_EX)
if not locked, abort
ftruncate(fd, 0)

// Write the pid.
write(fd, "<pid>")

// Inside signal handler..
unlink(pidfile)
{code}

Digging around, looks like the open, ftruncate, write pattern is pretty common:
http://man7.org/tlpi/code/online/diff/filelock/create_pid_file.c.html

The tooling around it could that the file is locked by the pid inside it, 
before taking any action (like signaling):

*Case 1*: If the file does not exist or is not locked, then assume nothing is 
running. It's possible for something to be running and about to grab the lock, 
but we'll eventually read it correctly and converge on a single instance 
started correctly.

*Case 2*: If the file is locked, and the pid doesn't match, then assume it is 
running but not as the pid in the file (.. yet). Treat this the same as (1), 
assume it's not running, and the next attempts to start will eventually 
converge on a single instance running.

*Case 3*: If the file is locked, and the pid matches the locker process, then 
assume it is running as that pid. Note that it's still possible that in between 
matching the pid and taking an action (e.g. kill), the pid may become stale, 
but the recycling pattern of pids makes it unlikely to be re-used unless there 
is a large delay.

It seems like some tools already do this signal wrapping (note the comment 
about fcntl and note the race from (3) in the BUGS section):
http://manpages.ubuntu.com/manpages/natty/man8/ovs-kill.8.html

  was:
Add a {{--pidfile}} option to the common logging flags. Right now we use a 
number of wrapper scripts to try and keep up a 
{{/var/run/mesos/mesos-slave.pid}} in order to be able to monitor the process.  
It would be nice if this extra (somewhat fragile) wrapper was not necessary.

Following implementation of this command line option, consider adding automatic 
removal of the file, as well as locking the file as a secondary signal that 
there is a live mesos-slave to new slaves attempting to start.

        Summary: Add a --pidfile option to master and agent binaries.  (was: 
mesos-slave and mesos-master should have a --pidfile option)

Expanded the description to capture more of the suggested implementation and 
how monit-like tooling and signaling tooling will leverage this.

> Add a --pidfile option to master and agent binaries.
> ----------------------------------------------------
>
>                 Key: MESOS-1648
>                 URL: https://issues.apache.org/jira/browse/MESOS-1648
>             Project: Mesos
>          Issue Type: Improvement
>          Components: master, slave
>            Reporter: Tobias Weingartner
>            Assignee: Greg Mann
>              Labels: newbie, twitter
>
> Right now we use a number of wrapper scripts to try and keep up a 
> {{/var/run/mesos/mesos-slave.pid}} in order to be able to monitor the 
> process.  This has proven to be somewhat fragile due to the lack of locking 
> and the possibility of races and stale data.
> By adding a {{--pidfile}}, we can obtain a lock on the file to prevent 
> multiple binaries from starting, and to enable the tooling to validate that 
> the lock is held before doing any signaling. We can also do a best effort 
> unlink in the signal handler upon termination:
> {code}
> // Get exclusive access to the file.
> fd = open(O_CREAT ...)
> flock(fd, LOCK_EX)
> if not locked, abort
> ftruncate(fd, 0)
> // Write the pid.
> write(fd, "<pid>")
> // Inside signal handler..
> unlink(pidfile)
> {code}
> Digging around, looks like the open, ftruncate, write pattern is pretty 
> common:
> http://man7.org/tlpi/code/online/diff/filelock/create_pid_file.c.html
> The tooling around it could that the file is locked by the pid inside it, 
> before taking any action (like signaling):
> *Case 1*: If the file does not exist or is not locked, then assume nothing is 
> running. It's possible for something to be running and about to grab the 
> lock, but we'll eventually read it correctly and converge on a single 
> instance started correctly.
> *Case 2*: If the file is locked, and the pid doesn't match, then assume it is 
> running but not as the pid in the file (.. yet). Treat this the same as (1), 
> assume it's not running, and the next attempts to start will eventually 
> converge on a single instance running.
> *Case 3*: If the file is locked, and the pid matches the locker process, then 
> assume it is running as that pid. Note that it's still possible that in 
> between matching the pid and taking an action (e.g. kill), the pid may become 
> stale, but the recycling pattern of pids makes it unlikely to be re-used 
> unless there is a large delay.
> It seems like some tools already do this signal wrapping (note the comment 
> about fcntl and note the race from (3) in the BUGS section):
> http://manpages.ubuntu.com/manpages/natty/man8/ovs-kill.8.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to