[
https://issues.apache.org/jira/browse/MAHOUT-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051527#comment-13051527
]
Frank Scholten commented on MAHOUT-680:
---------------------------------------
The problem is that there are two job jars, one from core and one from the
examples module and the one first on the classpath will be lookup dependencies
when running the job.
I echoed the CLASSPATH and the core job jar is listed first, so say you copied
DefaultAnalyzer to a class called Mahout680Analyzer and put it in the example
module, you get a ClassNotFoundException when running seq2sparse like this:
{code}
$ bin/mahout seq2sparse --input output --output output-seq2sparse
--analyzerName org.apache.mahout.analysis.Mahout680Analyzer
{code}
I tested this out on a test cluster a few minutes ago. When I moved the
Mahout680Analyzer custom analyzer to the core module it did work.
The mahout script adds both job jars to the classpath with the following
snippet:
{code}
# add dev targets if they exist
for f in $MAHOUT_HOME/*/target/mahout-*-job.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
{code}
but it uses the examples job jar to launch your Mahout job. See snippet:
{code}
for f in $MAHOUT_HOME/examples/target/mahout-examples-*-job.jar; do
if [ -e "$f" ]; then
MAHOUT_JOB=$f
fi
done
if [ "$MAHOUT_JOB" = "" ]; then
for f in $MAHOUT_HOME/mahout-examples-*-job.jar; do
if [ -e "$f" ]; then
MAHOUT_JOB=$f
fi
done
fi
{code}
and
{code}
exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB $CLASS "$@"
{code}
So the core job is never used by the mahout script, is this a problem? Maybe
the classpath snippet above should be changed so only the example job is added
to the classpath? Any thoughts?
> Running the Hadoop script through bin/mahout to set up classpath
> ----------------------------------------------------------------
>
> Key: MAHOUT-680
> URL: https://issues.apache.org/jira/browse/MAHOUT-680
> Project: Mahout
> Issue Type: Improvement
> Affects Versions: 0.4
> Reporter: Frank Scholten
> Priority: Minor
> Fix For: 0.5
>
> Attachments: MAHOUT-680.patch, MAHOUT-680.patch, jobtracker.jsp.html
>
>
> Added a patch which allows you to run the $HADOOP_HOME/bin/hadoop command
> script through the bin/mahout script.
> This way the Mahout script adds the Mahout classes to the $HADOOP_CLASSPATH
> so you can view sequencefiles generated by Mahout jobs with
> bin/mahout hadoop fs -text <sequencefile>
> without having to specify Mahout classes manually or getting
> ClassNotFoundExceptions
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira