mahout 0.10.0 example scripts

Andrew Palumbo Wed, 08 Apr 2015 10:23:04 -0700

I ran through all the large example scripts using all options:cluster-reuters.sh, classify-20newsgroups.sh, classify-wikipedia.sh andcluster-synthetic.sh, /examples/bin/run-rf.sh 1000 in bothMAHOUT_LOCAL=true and MAHOUT_LOCAL unset (cluster) modes.

Also ran factorize-movielens-1M.sh (uses MAHOUT_LOCAL=true only) andspark-document-classifier.mscala (mahout-shell script)


Setup:

Hadoop 2.4.1 pseudo-cluster using default config from hadoopconfiguration page.

Spark-1.1.1-bin-hadoop2.4 binarys downloaded pre-compiled.
$MASTER env variable set pointng to spark master URL.


Current status is:

MAHOUT_LOCAL unset:
  cluster-reuters -> (2) needs more yarn heap memory (noted in script)
  classify-wikipedia -> (1) needs more yarn heap memory (noted in script)

MAHOUT_LOCAL=true:

cluster-reuters -> (1) fails due to local vs cluster script issues -added to script: (runs from this example script in cluster mode only)classify-20Newsgroups -> (3),(4) exit gracefully with a message that'MAHOUT_LOCAL=true' can not be set. Similarly if $MASTER is not set.

If anyone has any problems running cluster-reuters.sh (this happenssometimes eg. if the download doesn't complete), you should just need todelete your /tmp/mahout-work-$user directory and run again.

mahout 0.10.0 example scripts

Reply via email to