I have faced this problem in the past, the solution was to add the analyzer jar to the job's jar [1] in order to have the analyzer installed in the cluster nodes.
[1] http://www.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ On Wed, Jul 20, 2011 at 10:53 AM, Grant Ingersoll <[email protected]>wrote: > I'm trying to understand a bit what our preferred mechanism is for users to > add custom libraries to the Mahout classpath when running on Hadoop. The > obvious case that comes to mind is adding your own Lucene Analyzer, which is > what I am trying to do. > > In looking at bin/mahout, we define CLASSPATH, in the non-core case to be: > # add release dependencies to CLASSPATH > for f in $MAHOUT_HOME/mahout-*.jar; do > CLASSPATH=${CLASSPATH}:$f; > done > > # add dev targets if they exist > for f in $MAHOUT_HOME/*/target/mahout-examples-*-job.jar; do > CLASSPATH=${CLASSPATH}:$f; > done > > # add release dependencies to CLASSPATH > for f in $MAHOUT_HOME/lib/*.jar; do > CLASSPATH=${CLASSPATH}:$f; > done > > From the looks of it, I could, on trunk, add in a lib directory and just > shove my dependency into that dir. > > However, further down, we don't seem to use that CLASSPATH, except when in > LOCAL mode or "hadoop" mode: > if [ "$1" = "hadoop" ]; then > export > HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH}:$CLASSPATH > exec "$HADOOP_HOME/bin/$@" > else > echo "MAHOUT-JOB: $MAHOUT_JOB" > export HADOOP_CLASSPATH=$MAHOUT_CONF_DIR:${HADOOP_CLASSPATH} > exec "$HADOOP_HOME/bin/hadoop" --config $HADOOP_CONF_DIR jar > $MAHOUT_JOB $CLASS "$@" > fi > > So this means, I should force "hadoop" mode by doing: > ./bin/mahout hadoop > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles ... > --analyzerName my.great.Analyzer > > instead of: > ./bin/mahout seq2sparse ... > > However, I still get Class Not Found even though when I echo the > $HADOOP_CLASSPATH my jar is in there and the jar contains my Analyzer. > > Any insight? > > -------------------------- > Grant Ingersoll > > > >
