[jira] Commented: (MAHOUT-573) hadoop job config parameter,e.g., -Dmapred.cache.archives, support in mahout wrapper

Shige Takeda (JIRA) Mon, 03 Jan 2011 00:52:10 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976649#action_12976649
 ]


Shige Takeda commented on MAHOUT-573:
-------------------------------------

Looks like I got confused between Java system property -Dkey=value and Hadoop's 
-D key=value options... 
As I mentioned above, my concern was to add hadoop config paramters, so in 
order to archive it, ToolRunner seems the way to go...

> hadoop job config parameter,e.g., -Dmapred.cache.archives, support in mahout 
> wrapper
> ------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-573
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-573
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.4
>         Environment: fedora 14 running on VirtualBox for Windows
> Windows Vespa
>            Reporter: Shige Takeda
>            Priority: Minor
>
> In order to specify a custom analyzer that utilizes a Japanese Morphological 
> Analyzer "Igo" referring to dictionary files on HDFS for seq2sparse, I needed 
> to pass the following job config:
> mapred.cache.archives="hdfs://localhost:9000/user/stakeda/ipadic.zip#ipadic
> mapred.create.symlink=yes
> This way, the IgoAnalyzer can read dictionaries from "./ipadic" as follows:
> https://github.com/smtakeda/mahout/blob/project101210/examples/src/main/java/org/apache/mahout/analysis/IgoAnalyzer.java
> Other use case is I needed to specify mapred.job.queue.name to something to 
> get appropriate priority for running jobs in  the work environment:
> https://github.com/smtakeda/mahout/blob/yahoo/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java
> ...
> conf.set("mapred.job.queue.name", "unfunded"); 
> Based on these two use cases, I would like to request/propose to add hadoop 
> job option support, i.e., -Dmapred.cache.archives=... to mahout wrapper.
> Changes are roughly expected in two ends; "bin/mahout" and all main functions 
> that parse command lines. Here is a quick patch for "bin/mahout":
> localhost ~/workspace/mahout_git/bin: git diff -r 
> f13e517408f20f75009e05e6c72c5fbb836e3f66 mahout 
> diff --git a/bin/mahout b/bin/mahout
> index 774fa11..9d78ceb 100755
> --- a/bin/mahout
> +++ b/bin/mahout
> @@ -116,6 +116,14 @@ CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
>  # so that filenames w/ spaces are handled correctly in loops below
>  IFS=
>  
> +
> +# JAVA_PROPERTIES
> +JAVA_PROPERTIES=
> +while [ $1 ] && [ ${1:0:2} == "-D" ] ; do 
> +    JAVA_PROPERTIES="$1 $JAVA_PROPERTIES"
> +    shift
> +done
> +
>  if [ $IS_CORE == 0 ] 
>  then
>    # add release dependencies to CLASSPATH
> @@ -198,7 +206,7 @@ if [ "$HADOOP_HOME" = "" ] || [ "$MAHOUT_LOCAL" != "" ] ; 
> then
>    elif [ "$MAHOUT_LOCAL" != "" ] ; then 
>      echo "MAHOUT_LOCAL is set, running locally"
>    fi
> -  exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS 
> "$@"
> +  exec "$JAVA" $JAVA_HEAP_MAX $JAVA_PROPERTIES $MAHOUT_OPTS -classpath 
> "$CLASSPATH" $CLASS "$@"
>  else
>    echo "Running on hadoop, using HADOOP_HOME=$HADOOP_HOME"
>    if [ "$HADOOP_CONF_DIR" = "" ] ; then
> @@ -213,7 +221,7 @@ else
>      exit 1
>    else
>    export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}
> -  exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB $CLASS "$@"
> +  exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB $CLASS "$@" $JAVA_PROPERTIES
>    fi 
>  fi

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-573) hadoop job config parameter,e.g., -Dmapred.cache.archives, support in mahout wrapper

Reply via email to