[
https://issues.apache.org/jira/browse/HADOOP-9085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503159#comment-13503159
]
Steve Loughran commented on HADOOP-9085:
----------------------------------------
Pid recycling is a permanent problem with Unix systems -you are correct that
something needs to be done. We can't rely on deleting the pid file on a
successful shutdown either, as all forms of killing are "successful" -even
server reboot.
I don't think the proposed patch would work as it's still looking for a file
{{$pid}}, even though it's no longer needed, and that file is also used in the
error text. Better to skip the -f check and use {{$curpid}} in the error. Even
after tha, it's pretty brittle against unintentional command matches.
What we need to do is move away from pid-file-liveness tests altogether.
There is a far more robust alternative, the service started up should create an
exclusive write lock on a well-known file. When the process dies, the OS
automatically releases this lock. I'll open a JIRA on it.
> start namenode failure,bacause pid of namenode pid file is other process pid
> or thread id before start namenode
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9085
> URL: https://issues.apache.org/jira/browse/HADOOP-9085
> Project: Hadoop Common
> Issue Type: Bug
> Components: bin
> Affects Versions: 2.0.1-alpha, 2.0.3-alpha
> Environment: NA
> Reporter: liaowenrui
> Fix For: 2.0.1-alpha, 2.0.2-alpha, 2.0.3-alpha
>
>
> pid of namenode pid file is other process pid or thread id before start
> namenode,start namenode will failure.because the pid of namenode pid file
> will be checked use kill -0 command before start namenode in hadoop-daemo.sh
> script.when pid of namenode pid file is other process pid or thread id,checkt
> is use kil -0 command,and the kill -0 will return success.it means the
> namenode is runing.in really,namenode is not runing.
> 2338 is dead namenode pid
> 2305 is datanode pid
> cqn2:/tmp # kill -0 2338
> cqn2:/tmp # ps -wweLo pid,ppid,tid | grep 2338
> 2305 1 2338
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira