Gangadhar, I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just have 1 entry (spain) and used WikipediaDatasetCreatorDriver to create the wikipediainput data set and then ran TrainClassifier and it worked. when I ran TestClassifier as below, I got blank results in the output.
$MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d wikipediainput -ng 3 -type bayes -source hdfs Summary ------------------------------------------------------- Correctly Classified Instances : 0 ?% Incorrectly Classified Instances : 0 ?% Total Classified Instances : 0 ======================================================= Confusion Matrix ------------------------------------------------------- a <--Classified as 0 | 0 a = spain Default Category: unknown: 1 I am not sure if I am doing something wrong.. have to figure out why my o/p is so blank. I'll document these steps and mention about country.txt in the wiki. Question to all Should we have 2 country.txt 1. country_full_list.txt - this is the existing list 2. country_sample_list.txt - a list with 2 or 3 countries To get a flavor of the wikipedia bayes example, we can use country_sample.txt. When new people want to just try out the example, they can reference this txt file as a parameter. To run the example in a robust scalable infrastructure, we could use country_full_list.txt. any thots ? regards Joe. On Sat, Sep 18, 2010 at 8:57 PM, Joe Kumar <[email protected]> wrote: > Gangadhar, > > After running TrainClassifier again, the map task just failed with the same > exception and I am pretty sure it is an issue with disk space. > As the map was progressing, I was monitoring my free disk space dropping > from 81GB. It came down to 0 after almost 66% through the map task and then > the exception happened. After the exception, another map task was resuming > at 33% and I got close to 15GB free space (i guess the first map task freed > up some space) and I am sure they would drop down to zero again and throw > the same exception. > I am going to modify the country.txt to just 1 country and recreate > wikipediainput and run TrainClassifier. Will let you know how it goes.. > > Do we have any benchmarks / system requirements for running this example ? > Has anyone else had success running this example anytime. Would appreciate > your inputs / thots. > > Should we look at tuning the code for handling these situations ? Any quick > suggestions on where to start looking at ? > > regards, > Joe. > > > >
