Gangadhar, Just to eliminate the usual suspects, I am using Mac OSX 10.5.8, Mahout 0.4 (revision 986659), Hadoop 0.20.2, 2GB Mem for Hadoop , 80 GB free space. commands tat I executed.
I had issues with my namenode and so did a format using hadoop namenode -format. $MAHOUT_HOME/examples/src/test/resources/country.txt had just 1 entry (spain). I havent tried with multiple entries. $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job org.apache.mahout.classifier.bayes.WikipediaXmlSplitter -d $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles10.xml -o wikipedia/chunks -c 64 $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i wikipedia/chunks -o wikipediainput -c $MAHOUT_HOME/examples/src/test/resources/country.txt $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job org.apache.mahout.classifier.bayes.TrainClassifier -i wikipediainput -o wikipediamodel -type bayes -source hdfs $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d wikipediainput -ng 3 -type bayes -source hdfs Please try the above and let me know. we'll try and find out what is going wrong. Reg, Joe. On Sun, Sep 19, 2010 at 11:13 PM, Gangadhar Nittala <[email protected] > wrote: > Joe, > Even I tried with reducing the number of countries in the country.txt. > That didn't help. And in my case, I was monitoring the disk space and > at no time did it reach 0%. So, I am not sure if that is the case. To > remove the dependency on the number of countries, I even tried with > the subjects.txt as the classification - that also did not help. > I think this problem is due to the type of the data being processed, > but what I am not sure of is what I need to change to get the data to > be processed successfully. > > The experienced folks on Mahout will be able to tell us what is missing I > guess. > > Thank you > Gangadhar > > On Sun, Sep 19, 2010 at 8:06 AM, Joe Kumar <[email protected]> wrote: > > Gangadhar, > > > > I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just > have > > 1 entry (spain) and used WikipediaDatasetCreatorDriver to create the > > wikipediainput data set and then ran TrainClassifier and it worked. when > I > > ran TestClassifier as below, I got blank results in the output. > > > > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job > > org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d > > wikipediainput -ng 3 -type bayes -source hdfs > > > > Summary > > ------------------------------------------------------- > > Correctly Classified Instances : 0 ?% > > Incorrectly Classified Instances : 0 ?% > > Total Classified Instances : 0 > > > > ======================================================= > > Confusion Matrix > > ------------------------------------------------------- > > a <--Classified as > > 0 | 0 a = spain > > Default Category: unknown: 1 > > > > I am not sure if I am doing something wrong.. have to figure out why my > o/p > > is so blank. > > I'll document these steps and mention about country.txt in the wiki. > > > > Question to all > > Should we have 2 country.txt > > > > 1. country_full_list.txt - this is the existing list > > 2. country_sample_list.txt - a list with 2 or 3 countries > > > > To get a flavor of the wikipedia bayes example, we can use > > country_sample.txt. When new people want to just try out the example, > they > > can reference this txt file as a parameter. > > To run the example in a robust scalable infrastructure, we could use > > country_full_list.txt. > > any thots ? > > > > regards > > Joe. > > > > On Sat, Sep 18, 2010 at 8:57 PM, Joe Kumar <[email protected]> wrote: > > > >> Gangadhar, > >> > >> After running TrainClassifier again, the map task just failed with the > same > >> exception and I am pretty sure it is an issue with disk space. > >> As the map was progressing, I was monitoring my free disk space dropping > >> from 81GB. It came down to 0 after almost 66% through the map task and > then > >> the exception happened. After the exception, another map task was > resuming > >> at 33% and I got close to 15GB free space (i guess the first map task > freed > >> up some space) and I am sure they would drop down to zero again and > throw > >> the same exception. > >> I am going to modify the country.txt to just 1 country and recreate > >> wikipediainput and run TrainClassifier. Will let you know how it goes.. > >> > >> Do we have any benchmarks / system requirements for running this example > ? > >> Has anyone else had success running this example anytime. Would > appreciate > >> your inputs / thots. > >> > >> Should we look at tuning the code for handling these situations ? Any > quick > >> suggestions on where to start looking at ? > >> > >> regards, > >> Joe. > >> > >> > >> > >> > > >
