[ 
https://issues.apache.org/jira/browse/HADOOP-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514317#comment-15514317
 ] 

Allen Wittenauer edited comment on HADOOP-13632 at 9/22/16 7:52 PM:
--------------------------------------------------------------------

We're basically racing against the process startup time and subsequent failure. 
We might pass that ps but still fail the renice, disown, or subsequent ps 
check.  That said, it wouldn't hurt to put another ps check after the timer and 
before the pid file write to catch hopefully a good chunk of the early failures.

The outfile may or may not be the correct file to look at, BTW. e.g., 
fs.defaultFS pointing to file://// will leave the out file empty.

Two Sidenotes: 

* I wonder why this code doesn't use hadoop_status_daemon.  I'm sure there is a 
good reason including that it was probably written before that function 
existed.  It probably should use it though so that we take advantage of 
whatever features someone makes if they replace it.  On the flip side, this 
code is extremely time critical (racey!) so the faster we are at completing, 
the better.

* This is some of my least favorite code that I've written.  Handling pid files 
outside of a daemon is full of fragility even outside of the edge cases. :(


was (Author: aw):
We're basically racing against the process startup time and subsequent failure. 
We might pass that ps but still fail the renice, disown, or subsequent ps 
check.  That said, it wouldn't hurt to put another ps check after the timer and 
before the pid file write to catch hopefully a good chunk of the early failures.

Two Sidenotes: 

* I wonder why this code doesn't use hadoop_status_daemon.  I'm sure there is a 
good reason including that it was probably written before that function 
existed.  It probably should use it though so that we take advantage of 
whatever features someone makes if they replace it.  On the flip side, this 
code is extremely time critical (racey!) so the faster we are at completing, 
the better.

* This is some of my least favorite code that I've written.  Handling pid files 
outside of a daemon is full of fragility even outside of the edge cases. :(

> Daemonization does not check process liveness before renicing
> -------------------------------------------------------------
>
>                 Key: HADOOP-13632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13632
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
>
> If you try to daemonize a process that is incorrectly configured, it will die 
> quite quickly. However, the daemonization function will still try to renice 
> it even if it's down, leading to something like this for my namenode:
> {noformat}
> -> % bin/hdfs --daemon start namenode
> ERROR: Cannot set priority of namenode process 12036
> {noformat}
> It'd be more user-friendly instead of this renice error, we said that the 
> process couldn't be started.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to