Hi all,
while trying to understand the clusterdump utility, I encountered a message
"no HADOOP_CONF_DIR or HADOOP_HOME set, running locally". As per some of the
discussions in Apr'10 (
http://lucene.472066.n3.nabble.com/Dealing-with-kmean-and-meanshift-output-td708824.html#a709066),
it would be good to default the hadoop conf directory to hadoop_home/conf.
So I've changed the script to accommodate this.
*Existing code:*
if [ "$HADOOP_CONF_DIR" = "" ] || [ "$HADOOP_HOME" = "" ] || [
"$MAHOUT_LOCAL" != "" ] ; then
if [ "$HADOOP_CONF_DIR" = "" ] || [ "$HADOOP_HOME" = "" ] ; then
echo "no HADOOP_CONF_DIR or HADOOP_HOME set, running locally"
elif [ "$MAHOUT_LOCAL" != "" ] ; then
echo "MAHOUT_LOCAL is set, running locally"
fi
exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
"$@"
else
echo "running on hadoop, using HADOOP_HOME=$HADOOP_HOME and
HADOOP_CONF_DIR=$HADOOP_CONF_DIR"
*New code:*
if [ "$HADOOP_HOME" = "" ] || [ "$MAHOUT_LOCAL" != "" ] ; then
if [ "$HADOOP_HOME" = "" ] ; then
echo "no HADOOP_HOME set, running locally"
elif [ "$MAHOUT_LOCAL" != "" ] ; then
echo "MAHOUT_LOCAL is set, running locally"
fi
exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
"$@"
else
if [ "$HADOOP_CONF_DIR" = "" ] ; then
HADOOP_CONF_DIR=$HADOOP_HOME/conf
echo "no HADOOP_CONF_DIR set. so defaulting HADOOP_CONF_DIR to
$HADOOP_HOME/conf "
fi
echo "running on hadoop, using HADOOP_HOME=$HADOOP_HOME and
HADOOP_CONF_DIR=$HADOOP_CONF_DIR"
In my local machine, I've just set HADOOP_HOME and the script defaults the
$HADOOP_CONF_DIR directory and runs on hadoop.
If this issue / fix looks valid, I'll open a JIRA issue and attach the
patch.
Appreciate your thoughts / feedbacks,
Joe.