[ 
https://issues.apache.org/jira/browse/HADOOP-9086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503163#comment-13503163
 ] 

Steve Loughran commented on HADOOP-9086:
----------------------------------------

This is the strategy adopted by {{daemontools}}: 
[http://cr.yp.to/daemontools/setlock.html]

that {{setlock}} command does not modify the invoked code, but it does require 
that the only way a service can be deployed is via the setlock process. 

For Hadoop, some options are
# have the hadoop-service/init.d scripts use something similar to setlock.
# move the lock creation logic into the Singleton services themselves -they'd 
take an option listing the file to create, attempt to create/open that file 
with exclusive write on startup and exit immediately if that could not be done.
# the service scripts could then omit the liveness checks themselves, because 
the daemon would do it for them. however, pid files have other uses (e.g {{sudo 
kill `cat /var/log/hadoop/namenode.pid`}}). They should still be created -just 
not used for enforcing singleton logic.

This *should* also work on Windows, with the caveat that older non-Server 
editions of Windows didn't always release file locks on process termination. 
Testing would be required there.

                
> Enforce process singleton rules through an exclusive write lock on a file, 
> not a pid file +kill -0,
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9086
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 1.1.1, 2.0.3-alpha
>         Environment: Unix/Linux. 
>            Reporter: Steve Loughran
>
> the {{hadoop-daemon.sh}} script (and other liveness monitors) probe the 
> existence of a daemon service by a {{kill -0}} of a process id picked up from 
> a pid file. 
> This is flawed
> # pid file locations may change with installations.
> # Linux and Unix recycle pids, leading to false positives -the scripts think 
> the process is running, when another process is.
> # doesn't work on windows.
> Having the processes acquire an exclusive write-lock on a known file would 
> delegate lock management and implicitly liveness to the OS itself. when the 
> process dies, the lock is relased (on Unixes)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to