[ 
https://issues.apache.org/jira/browse/HADOOP-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160022#comment-16160022
 ] 

Steve Loughran commented on HADOOP-14855:
-----------------------------------------

You could always check to see if its a java process, which is resilient to any 
issues about process name. How do you check that? jstack will do it, though its 
exit code 1 means both "no process" and "process not listening"
{code}
bash-3.2$ time jstack 470
470: Unable to open socket file: target process not responding or HotSpot VM 
not loaded
The -F option can be used when the target process is not responding

real    0m5.439s
user    0m0.127s
sys     0m0.038s
bash-3.2$ echo $?
1
{code}

if the process is a java one, you get the stack trace and the exit code == 0

I could imagine a sequence of file -> pid -> kill -0 -> jstack, so the jstack 
check is only done if the process is known to be running. 

> Hadoop scripts may errantly believe a daemon is still running, preventing it 
> from starting
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14855
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Aaron T. Myers
>
> I encountered a case recently where the NN wouldn't start, with the error 
> message "namenode is running as process 16769.  Stop it first." In fact the 
> NN was not running at all, but rather another long-running process was 
> running with this pid.
> It looks to me like our scripts just check to see if _any_ process is running 
> with the pid that the NN (or any Hadoop daemon) most recently ran with. This 
> is clearly not a fool-proof way of checking to see if a particular type of 
> daemon is now running, as some other process could start running with the 
> same pid since the daemon in question was previously shut down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to