[
https://issues.apache.org/jira/browse/HADOOP-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503163#comment-13503163
]
Steve Loughran commented on HADOOP-9086:
----------------------------------------
This is the strategy adopted by {{daemontools}}:
[http://cr.yp.to/daemontools/setlock.html]
that {{setlock}} command does not modify the invoked code, but it does require
that the only way a service can be deployed is via the setlock process.
For Hadoop, some options are
# have the hadoop-service/init.d scripts use something similar to setlock.
# move the lock creation logic into the Singleton services themselves -they'd
take an option listing the file to create, attempt to create/open that file
with exclusive write on startup and exit immediately if that could not be done.
# the service scripts could then omit the liveness checks themselves, because
the daemon would do it for them. however, pid files have other uses (e.g {{sudo
kill `cat /var/log/hadoop/namenode.pid`}}). They should still be created -just
not used for enforcing singleton logic.
This *should* also work on Windows, with the caveat that older non-Server
editions of Windows didn't always release file locks on process termination.
Testing would be required there.
> Enforce process singleton rules through an exclusive write lock on a file,
> not a pid file +kill -0,
> ---------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9086
> URL: https://issues.apache.org/jira/browse/HADOOP-9086
> Project: Hadoop Common
> Issue Type: Improvement
> Components: util
> Affects Versions: 1.1.1, 2.0.3-alpha
> Environment: Unix/Linux.
> Reporter: Steve Loughran
>
> the {{hadoop-daemon.sh}} script (and other liveness monitors) probe the
> existence of a daemon service by a {{kill -0}} of a process id picked up from
> a pid file.
> This is flawed
> # pid file locations may change with installations.
> # Linux and Unix recycle pids, leading to false positives -the scripts think
> the process is running, when another process is.
> # doesn't work on windows.
> Having the processes acquire an exclusive write-lock on a known file would
> delegate lock management and implicitly liveness to the OS itself. when the
> process dies, the lock is relased (on Unixes)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira