[ 
https://issues.apache.org/jira/browse/MAHOUT-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051527#comment-13051527
 ] 

Frank Scholten commented on MAHOUT-680:
---------------------------------------

The problem is that there are two job jars, one from core and one from the 
examples module and the one first on the classpath will be lookup dependencies 
when running the job.

I echoed the CLASSPATH and the core job jar is listed first, so say you copied 
DefaultAnalyzer to a class called Mahout680Analyzer and put it in the example 
module, you get a ClassNotFoundException when running seq2sparse like this:

{code}
$ bin/mahout seq2sparse --input output --output output-seq2sparse 
--analyzerName org.apache.mahout.analysis.Mahout680Analyzer
{code}

I tested this out on a test cluster a few minutes ago. When I moved the 
Mahout680Analyzer custom analyzer to the core module it did work.

The mahout script adds both job jars to the classpath with the following 
snippet:

{code}
  # add dev targets if they exist
  for f in $MAHOUT_HOME/*/target/mahout-*-job.jar; do
    CLASSPATH=${CLASSPATH}:$f;
  done
{code}

but it uses the examples job jar to launch your Mahout job. See snippet:

{code}
for f in $MAHOUT_HOME/examples/target/mahout-examples-*-job.jar; do
  if [ -e "$f" ]; then
    MAHOUT_JOB=$f
  fi
done

if [ "$MAHOUT_JOB" = "" ]; then
  for f in $MAHOUT_HOME/mahout-examples-*-job.jar; do
    if [ -e "$f" ]; then
      MAHOUT_JOB=$f
    fi
  done
fi
{code}

and 

{code}
exec "$HADOOP_HOME/bin/hadoop" jar $MAHOUT_JOB $CLASS "$@"
{code}

So the core job is never used by the mahout script, is this a problem? Maybe 
the classpath snippet above should be changed so only the example job is added 
to the classpath? Any thoughts?

> Running the Hadoop script through bin/mahout to set up classpath
> ----------------------------------------------------------------
>
>                 Key: MAHOUT-680
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-680
>             Project: Mahout
>          Issue Type: Improvement
>    Affects Versions: 0.4
>            Reporter: Frank Scholten
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: MAHOUT-680.patch, MAHOUT-680.patch, jobtracker.jsp.html
>
>
> Added a patch which allows you to run the $HADOOP_HOME/bin/hadoop command 
> script through the bin/mahout script.
> This way the Mahout script adds the Mahout classes to the $HADOOP_CLASSPATH 
> so you can view sequencefiles generated by Mahout jobs with
> bin/mahout hadoop fs -text <sequencefile>
> without having to specify Mahout classes manually or getting 
> ClassNotFoundExceptions

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to