[ 
https://issues.apache.org/jira/browse/HADOOP-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407109#comment-16407109
 ] 

Robert Kanter commented on HADOOP-14855:
----------------------------------------

I think we should try to lower the chances of doing this.  If a user runs into 
this issue, they're going to be very confused and won't know how to fix it 
without looking into our shell scripts (if they even figure out that they 
should look there).  While it's not 100% perfect, I think we should do 
something along the lines of what [~aw] suggested in [this earlier 
comment|https://issues.apache.org/jira/browse/HADOOP-14855?focusedCommentId=16159435&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16159435]:
{code}
pspid=$(ps -fp "${pid}" 2>/dev/null)

if [[ $? -ne 0]]; then
  if [[ ${pspid} =~ Dproc_${daemonname} ]]; then
{code}
This should significantly cut down the likelihood of running into this issue 
and seems pretty easy to do with little cost.  I'll try to come up with a patch 
soon.

> Hadoop scripts may errantly believe a daemon is still running, preventing it 
> from starting
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14855
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14855
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Aaron T. Myers
>            Priority: Major
>
> I encountered a case recently where the NN wouldn't start, with the error 
> message "namenode is running as process 16769.  Stop it first." In fact the 
> NN was not running at all, but rather another long-running process was 
> running with this pid.
> It looks to me like our scripts just check to see if _any_ process is running 
> with the pid that the NN (or any Hadoop daemon) most recently ran with. This 
> is clearly not a fool-proof way of checking to see if a particular type of 
> daemon is now running, as some other process could start running with the 
> same pid since the daemon in question was previously shut down.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to