I have faced this problem in the past, the solution was to add the analyzer
jar to the job's jar [1] in order to have the analyzer installed in the
cluster nodes.

[1]
http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

On Wed, Jul 20, 2011 at 10:53 AM, Grant Ingersoll <[email protected]>wrote:

> I'm trying to understand a bit what our preferred mechanism is for users to
> add custom libraries to the Mahout classpath when running on Hadoop.  The
> obvious case that comes to mind is adding your own Lucene Analyzer, which is
> what I am trying to do.
>
> In looking at bin/mahout, we define CLASSPATH, in the non-core case to be:
> # add release dependencies to CLASSPATH
>  for f in $MAHOUT_HOME/mahout-*.jar; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
>
>  # add dev targets if they exist
>  for f in $MAHOUT_HOME/*/target/mahout-examples-*-job.jar; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
>
>  # add release dependencies to CLASSPATH
>  for f in $MAHOUT_HOME/lib/*.jar; do
>    CLASSPATH=${CLASSPATH}:$f;
>  done
>
> From the looks of it, I could, on trunk, add in a lib directory and just
> shove my dependency into that dir.
>
> However, further down, we don't seem to use that CLASSPATH, except when in
> LOCAL mode or "hadoop" mode:
> if [ "$1" = "hadoop" ]; then
>      export
> HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}:$CLASSPATH
>      exec "$HADOOP_HOME/bin/$@"
> else
>      echo "MAHOUT-JOB: $MAHOUT_JOB"
>      export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}
>      exec "$HADOOP_HOME/bin/hadoop" --config $HADOOP_CONF_DIR jar
> $MAHOUT_JOB $CLASS "$@"
> fi
>
> So this means, I should force "hadoop" mode by doing:
> ./bin/mahout hadoop
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles ...
> --analyzerName my.great.Analyzer
>
> instead of:
> ./bin/mahout seq2sparse ...
>
> However, I still get Class Not Found even though when I echo the
> $HADOOP_CLASSPATH my jar is in there and the jar contains my Analyzer.
>
> Any insight?
>
> --------------------------
> Grant Ingersoll
>
>
>
>

Reply via email to