[
https://issues.apache.org/jira/browse/MAHOUT-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shige Takeda resolved MAHOUT-573.
---------------------------------
Resolution: Not A Problem
ah, I found no change is required on mahout driver but only on the program
drivers...
> hadoop job config parameter,e.g., -Dmapred.cache.archives, support in mahout
> wrapper
> ------------------------------------------------------------------------------------
>
> Key: MAHOUT-573
> URL: https://issues.apache.org/jira/browse/MAHOUT-573
> Project: Mahout
> Issue Type: Improvement
> Components: Utils
> Affects Versions: 0.4
> Environment: fedora 14 running on VirtualBox for Windows
> Windows Vespa
> Reporter: Shige Takeda
> Priority: Minor
>
> In order to specify a custom analyzer that utilizes a Japanese Morphological
> Analyzer "Igo" referring to dictionary files on HDFS for seq2sparse, I needed
> to pass the following job config:
> mapred.cache.archives="hdfs://localhost:9000/user/stakeda/ipadic.zip#ipadic
> mapred.create.symlink=yes
> This way, the IgoAnalyzer can read dictionaries from "./ipadic" as follows:
> https://github.com/smtakeda/mahout/blob/project101210/examples/src/main/java/org/apache/mahout/analysis/IgoAnalyzer.java
> Other use case is I needed to specify mapred.job.queue.name to something to
> get appropriate priority for running jobs in the work environment:
> https://github.com/smtakeda/mahout/blob/yahoo/core/src/main/java/org/apache/mahout/clustering/canopy/CanopyDriver.java
> ...
> conf.set("mapred.job.queue.name", "unfunded");
> Based on these two use cases, I would like to request/propose to add hadoop
> job option support, i.e., -Dmapred.cache.archives=... to mahout wrapper.
> Changes are roughly expected in two ends; "bin/mahout" and all main functions
> that parse command lines. Here is a quick patch for "bin/mahout":
> localhost ~/workspace/mahout_git/bin: git diff -r
> f13e517408f20f75009e05e6c72c5fbb836e3f66 mahout
> diff --git a/bin/mahout b/bin/mahout
> index 774fa11..9d78ceb 100755
> --- a/bin/mahout
> +++ b/bin/mahout
> @@ -116,6 +116,14 @@ CLASSPATH=${CLASSPATH}:$JAVA_HOME/lib/tools.jar
> # so that filenames w/ spaces are handled correctly in loops below
> IFS=
>
> +
> +# JAVA_PROPERTIES
> +JAVA_PROPERTIES=
> +while [ $1 ] && [ ${1:0:2} == "-D" ] ; do
> + JAVA_PROPERTIES="$1 $JAVA_PROPERTIES"
> + shift
> +done
> +
> if [ $IS_CORE == 0 ]
> then
> # add release dependencies to CLASSPATH
> @@ -198,7 +206,7 @@ if [ "$HADOOP_HOME" = "" ] || [ "$MAHOUT_LOCAL" != "" ] ;
> then
> elif [ "$MAHOUT_LOCAL" != "" ] ; then
> echo "MAHOUT_LOCAL is set, running locally"
> fi
> - exec "$JAVA" $JAVA_HEAP_MAX $MAHOUT_OPTS -classpath "$CLASSPATH" $CLASS
> "$@"
> + exec "$JAVA" $JAVA_HEAP_MAX $JAVA_PROPERTIES $MAHOUT_OPTS -classpath
> "$CLASSPATH" $CLASS "$@"
> else
> echo "Running on hadoop, using HADOOP_HOME=$HADOOP_HOME"
> if [ "$HADOOP_CONF_DIR" = "" ] ; then
> @@ -213,7 +221,7 @@ else
> exit 1
> else
> export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}
> - exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB $CLASS "$@"
> + exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB $CLASS "$@" $JAVA_PROPERTIES
> fi
> fi
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.