[
https://issues.apache.org/jira/browse/HDFS-6239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968038#comment-13968038
]
xyzzy commented on HDFS-6239:
-----------------------------
I had tried to bypass the problem by playing around with different types of
quoting in the following line:
exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \;
"$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
For example, I tried something like:
exec "$bin/slaves.sh" "--config $HADOOP_CONF_DIR cd $HADOOP_HOME ;
$bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR $@"
This causes the semi-colon to not get quoted, but when it executes the word
'datanode' (the last argument in the bash -x output in the description) gets
unquoted. I'm pretty sure that there is some issue around quoting and escaping
characters in this one line, and you hit the issue in some special cases
(probably like the long path in my filesystem)
> start-dfs.sh does not start remote DataNode due to escape characters
> --------------------------------------------------------------------
>
> Key: HDFS-6239
> URL: https://issues.apache.org/jira/browse/HDFS-6239
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: scripts
> Affects Versions: 1.2.1
> Environment: GNU bash, version 4.1.2(1)-release
> (x86_64-redhat-linux-gnu)
> Linux foo 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Dec 13 06:58:20 EST 2013
> x86_64 x86_64 x86_64 GNU/Linux
> AFS file system.
> Reporter: xyzzy
>
> start-dfs.sh fails to start remote data nodes and task nodes, though it is
> possible to start them manually through hadoop-daemon.sh.
> I've been able to debug and find the root cause the bug, and I thought it was
> a trivial fix, but I do not know how to do it. Can't figure out a way to
> handle this seemingly trivial bug.
> hadoop-daemons.sh calls slave.sh:
> exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_HOME" \;
> "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"
> This is the issue when I debug using bash -x: In slaves.sh, the \; becomes ';'
> + ssh xxxx.xx.xxxx.xxx cd /afs/xx.xxxx.xxx/x/x/x/xx/xxxxx/libexec/.. ';'
> /afs/xx.xxxx.xxx/x/x/x/xx/xxxx/bin/hadoop-daemon.sh --config
> /afs/xx.xxxx.xxx/x/x/x/xx/xxxx/libexec/../conf start datanode
> The problem is ';' . Because the semi-colon is surrounded by quotes, it
> doesn't execute the code after that. I manually ran the above command, and as
> expected the data node did not start. When I removed the quotes around the
> semi-colon, everything works. Please note that you can see the issue only
> when you do bash -x. If you echo the statement, the quotes around the
> semi-colon are not visible.
> This issue is always reproducible for me, and because of it, I have to
> manually start daemons on each machine.
--
This message was sent by Atlassian JIRA
(v6.2#6252)