Allen Wittenauer commented on HADOOP-13717:

bq. Man, the balancer is a can of worms. Not a daemon, but runs awkwardly 
longer than other normal commands. This is the root of potential differences in 

Yup. Completely agree. :)

bq. What do you propose we do? Minimally, we should make sure all the members 
of the balancer family have the same daemonization behavior, which is currently 

Yes, we probably should make sure they are treated the same in the hdfs script 
if they aren't.  We should definitely avoid adding more sbin scripts.  My hope 
is that in 4.x we can wipe out most of sbin and reduce our code footprint.

bq. If part of the answer is that the balancer family are daemons and need a 
HADOOP_LOG_DIR and a log4j.properties, that's fine with me. Not a hard change 
on our side.

I was thinking about what kind of interfaces/guarantees we provide 3rd parties. 
We make no promises about the content of log4j that I could find, so that's an 
easy one. But if a non-ASF jar gets added to the classpath via shellprofile 
what would the expectations on HADOOP_LOG_DIR and -Dhadoop.log.dir be?  The key 
might be hadoop-env.sh:

# Where (primarily) daemon log files are stored.
# ${HADOOP_HOME}/logs by default.
# Java property: hadoop.log.dir

It's pretty clear that HADOOP_LOG_DIR is expected to point somewhere valid when 
we run as a daemon. HADOOP_LOG_DIR needs to work then on daemons. That leads me 
to we basically have three choices:

1. If there is a general agreement amongst the community that balancer and 
friends should run with HADOOP_SUBCMD_SUPPORTDAEMONIZATION=true, then 
HADOOP_LOG_DIR should be set to something writable (e.g., /tmp) when it is 
being executed.

2. If balancer should be run with HADOOP_SUBCMD_SUPPORTDAEMONIZATION=false, it 
now becomes a normal client command and sbin/start-balancer goes away.  
HADOOP_LOG_DIR, etc, now become irrelevant.

3. Some third state needs to get introduced and all of the accompanying support 
code added so that we can support it in all of the user-executable scripts.

At this point, yes, I think the easiest path forward really is #1: 
HADOOP_LOG_DIR must point somewhere writable.  All of the other options have a 
lot more pain involved, for us and/or the end users.

> Shell scripts call hadoop_verify_logdir even when command is not started as 
> daemon
> ----------------------------------------------------------------------------------
>                 Key: HADOOP-13717
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13717
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Andrew Wang
> Issue found when working with the HDFS balancer.
> In {{hadoop_daemon_handler}}, it calls {{hadoop_verify_logdir}} even for the 
> "default" case which calls {{hadoop_start_daemon}}. {{daemon_outfile}} which 
> specifies the log location isn't even used here, since the command is being 
> started in the foreground.
> I think we can push the {{hadoop_verify_logdir}} call down into 
> {{hadoop_start_daemon_wrapper}} instead, which does use the outfile.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to