[
https://issues.apache.org/jira/browse/HADOOP-13632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wang updated HADOOP-13632:
---------------------------------
Attachment: HADOOP-13632.001.patch
Here's a patch which moves us over to {{hadoop_status_daemon}}. Tested manually
with an empty config that causes the NN to abort quickly. I left out the error
message, but I can add it if you think it doesn't hurt.
The timing condition is quite fine though. If I instead use a valid config but
an unformatted namedir so it dies later during NN initialization, it doesn't
trigger.
Since this is a pretty common error, we could try and catch this by extending
the timer loop. I remember talking to a Cloudera Manager engineer who maintains
a similar startup script, and CM waits for longer than 5s (I think 30s?) to
confirm that the process is still alive.
Thoughts?
> Daemonization does not check process liveness before renicing
> -------------------------------------------------------------
>
> Key: HADOOP-13632
> URL: https://issues.apache.org/jira/browse/HADOOP-13632
> Project: Hadoop Common
> Issue Type: Bug
> Components: scripts
> Affects Versions: 3.0.0-alpha1
> Reporter: Andrew Wang
> Attachments: HADOOP-13632.001.patch
>
>
> If you try to daemonize a process that is incorrectly configured, it will die
> quite quickly. However, the daemonization function will still try to renice
> it even if it's down, leading to something like this for my namenode:
> {noformat}
> -> % bin/hdfs --daemon start namenode
> ERROR: Cannot set priority of namenode process 12036
> {noformat}
> It'd be more user-friendly instead of this renice error, we said that the
> process couldn't be started.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]