[ 
https://issues.apache.org/jira/browse/HADOOP-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160023#comment-16160023
 ] 

Steve Loughran commented on HADOOP-14855:
-----------------------------------------

note that the solution of HADOOP-9086, daemons to hold exclusive locks, is the 
way to guarantee that a named service is running/not running, as the moment a 
process dies its locks are released. If the lock can't be acquried then the 
process is running, (live or zombie).

> Hadoop scripts may errantly believe a daemon is still running, preventing it 
> from starting
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14855
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Aaron T. Myers
>
> I encountered a case recently where the NN wouldn't start, with the error 
> message "namenode is running as process 16769.  Stop it first." In fact the 
> NN was not running at all, but rather another long-running process was 
> running with this pid.
> It looks to me like our scripts just check to see if _any_ process is running 
> with the pid that the NN (or any Hadoop daemon) most recently ran with. This 
> is clearly not a fool-proof way of checking to see if a particular type of 
> daemon is now running, as some other process could start running with the 
> same pid since the daemon in question was previously shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to