I ran through all the large example scripts using all options: cluster-reuters.sh, classify-20newsgroups.sh, classify-wikipedia.sh and cluster-synthetic.sh, /examples/bin/run-rf.sh 1000 in both MAHOUT_LOCAL=true and MAHOUT_LOCAL unset (cluster) modes.

Also ran factorize-movielens-1M.sh (uses MAHOUT_LOCAL=true only) and spark-document-classifier.mscala (mahout-shell script)

Setup:
Hadoop 2.4.1 pseudo-cluster using default config from hadoop configuration page.
Spark-1.1.1-bin-hadoop2.4 binarys downloaded pre-compiled.
$MASTER env variable set pointng to spark master URL.


Current status is:

MAHOUT_LOCAL unset:
  cluster-reuters -> (2) needs more yarn heap memory (noted in script)
  classify-wikipedia -> (1) needs more yarn heap memory (noted in script)

MAHOUT_LOCAL=true:
cluster-reuters -> (1) fails due to local vs cluster script issues - added to script: (runs from this example script in cluster mode only) classify-20Newsgroups -> (3),(4) exit gracefully with a message that 'MAHOUT_LOCAL=true' can not be set. Similarly if $MASTER is not set.

If anyone has any problems running cluster-reuters.sh (this happens sometimes eg. if the download doesn't complete), you should just need to delete your /tmp/mahout-work-$user directory and run again.

Reply via email to