I don't know if it's related, but I remember getting a similar Exception one year ago when I was working on the implementation of Random Forests. In my case it was caused by SequenceFile.Sorter.merge(). I ended up writing my own merge function because I really didn't need to sort the output.
On Mon, Sep 20, 2010 at 6:14 AM, Joe Kumar <[email protected]> wrote: > Gangadhar, > > Just to eliminate the usual suspects, I am using Mac OSX 10.5.8, Mahout 0.4 > (revision 986659), Hadoop 0.20.2, 2GB Mem for Hadoop , 80 GB free space. > commands tat I executed. > > I had issues with my namenode and so did a format using hadoop namenode > -format. > $MAHOUT_HOME/examples/src/test/resources/country.txt had just 1 entry > (spain). I havent tried with multiple entries. > > $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job > org.apache.mahout.classifier.bayes.WikipediaXmlSplitter -d > $MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles10.xml -o > wikipedia/chunks -c 64 > > $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job > org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i > wikipedia/chunks -o wikipediainput -c > $MAHOUT_HOME/examples/src/test/resources/country.txt > > $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job > org.apache.mahout.classifier.bayes.TrainClassifier -i wikipediainput -o > wikipediamodel -type bayes -source hdfs > > $> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job > org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d > wikipediainput -ng 3 -type bayes -source hdfs > > Please try the above and let me know. we'll try and find out what is going > wrong. > Reg, > Joe. > > On Sun, Sep 19, 2010 at 11:13 PM, Gangadhar Nittala <[email protected] >> wrote: > >> Joe, >> Even I tried with reducing the number of countries in the country.txt. >> That didn't help. And in my case, I was monitoring the disk space and >> at no time did it reach 0%. So, I am not sure if that is the case. To >> remove the dependency on the number of countries, I even tried with >> the subjects.txt as the classification - that also did not help. >> I think this problem is due to the type of the data being processed, >> but what I am not sure of is what I need to change to get the data to >> be processed successfully. >> >> The experienced folks on Mahout will be able to tell us what is missing I >> guess. >> >> Thank you >> Gangadhar >> >> On Sun, Sep 19, 2010 at 8:06 AM, Joe Kumar <[email protected]> wrote: >> > Gangadhar, >> > >> > I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just >> have >> > 1 entry (spain) and used WikipediaDatasetCreatorDriver to create the >> > wikipediainput data set and then ran TrainClassifier and it worked. when >> I >> > ran TestClassifier as below, I got blank results in the output. >> > >> > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job >> > org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d >> > wikipediainput -ng 3 -type bayes -source hdfs >> > >> > Summary >> > ------------------------------------------------------- >> > Correctly Classified Instances : 0 ?% >> > Incorrectly Classified Instances : 0 ?% >> > Total Classified Instances : 0 >> > >> > ======================================================= >> > Confusion Matrix >> > ------------------------------------------------------- >> > a <--Classified as >> > 0 | 0 a = spain >> > Default Category: unknown: 1 >> > >> > I am not sure if I am doing something wrong.. have to figure out why my >> o/p >> > is so blank. >> > I'll document these steps and mention about country.txt in the wiki. >> > >> > Question to all >> > Should we have 2 country.txt >> > >> > 1. country_full_list.txt - this is the existing list >> > 2. country_sample_list.txt - a list with 2 or 3 countries >> > >> > To get a flavor of the wikipedia bayes example, we can use >> > country_sample.txt. When new people want to just try out the example, >> they >> > can reference this txt file as a parameter. >> > To run the example in a robust scalable infrastructure, we could use >> > country_full_list.txt. >> > any thots ? >> > >> > regards >> > Joe. >> > >> > On Sat, Sep 18, 2010 at 8:57 PM, Joe Kumar <[email protected]> wrote: >> > >> >> Gangadhar, >> >> >> >> After running TrainClassifier again, the map task just failed with the >> same >> >> exception and I am pretty sure it is an issue with disk space. >> >> As the map was progressing, I was monitoring my free disk space dropping >> >> from 81GB. It came down to 0 after almost 66% through the map task and >> then >> >> the exception happened. After the exception, another map task was >> resuming >> >> at 33% and I got close to 15GB free space (i guess the first map task >> freed >> >> up some space) and I am sure they would drop down to zero again and >> throw >> >> the same exception. >> >> I am going to modify the country.txt to just 1 country and recreate >> >> wikipediainput and run TrainClassifier. Will let you know how it goes.. >> >> >> >> Do we have any benchmarks / system requirements for running this example >> ? >> >> Has anyone else had success running this example anytime. Would >> appreciate >> >> your inputs / thots. >> >> >> >> Should we look at tuning the code for handling these situations ? Any >> quick >> >> suggestions on where to start looking at ? >> >> >> >> regards, >> >> Joe. >> >> >> >> >> >> >> >> >> > >> >
