[
https://issues.apache.org/jira/browse/HADOOP-14855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16159228#comment-16159228
]
Allen Wittenauer commented on HADOOP-14855:
-------------------------------------------
This is a dupe of HADOOP-9085 (and it's buddy HADOOP-9086).
[[email protected]]'s comments are spot on, with this being the key one:
bq. What we need to do is move away from pid-file-liveness tests altogether.
Unfortunately, we're using Java. Doing liveliness checks anywhere but in bash
is either extremely expensive due to the massive classpath or
non-portable/introduces more environmental dependencies.
Other thoughts:
1) These types of pid clashes are more on the edge case/rare side. They just
generally aren't worth spending the effort on.
2) Given user-functions and shell profiles, it's possible for end users (or
vendors) to replace the pid checking/handling on their own. I'm expecting
experienced admins to replace it with daemontools and the like.
> Hadoop scripts may errantly believe a daemon is still running, preventing it
> from starting
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-14855
> URL: https://issues.apache.org/jira/browse/HADOOP-14855
> Project: Hadoop Common
> Issue Type: Bug
> Components: scripts
> Affects Versions: 3.0.0-alpha4
> Reporter: Aaron T. Myers
>
> I encountered a case recently where the NN wouldn't start, with the error
> message "namenode is running as process 16769. Stop it first." In fact the
> NN was not running at all, but rather another long-running process was
> running with this pid.
> It looks to me like our scripts just check to see if _any_ process is running
> with the pid that the NN (or any Hadoop daemon) most recently ran with. This
> is clearly not a fool-proof way of checking to see if a particular type of
> daemon is now running, as some other process could start running with the
> same pid since the daemon in question was previously shut down.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]