Gangadhar,

Just to eliminate the usual suspects, I am using Mac OSX 10.5.8, Mahout 0.4
(revision 986659), Hadoop 0.20.2, 2GB Mem for Hadoop , 80 GB free space.
commands tat I executed.

I had issues with my namenode and so did a format using hadoop namenode
-format.
$MAHOUT_HOME/examples/src/test/resources/country.txt had just 1 entry
(spain). I havent tried with multiple entries.

$> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.WikipediaXmlSplitter -d
$MAHOUT_HOME/examples/temp/enwiki-latest-pages-articles10.xml -o
wikipedia/chunks -c 64

$> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i
wikipedia/chunks -o wikipediainput -c
$MAHOUT_HOME/examples/src/test/resources/country.txt

$> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.TrainClassifier -i wikipediainput -o
wikipediamodel  -type bayes -source hdfs

$> hadoop jar $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d
 wikipediainput  -ng 3 -type bayes -source hdfs

Please try the above and let me know. we'll try and find out what is going
wrong.
Reg,
Joe.

On Sun, Sep 19, 2010 at 11:13 PM, Gangadhar Nittala <[email protected]
> wrote:

> Joe,
> Even I tried with reducing the number of countries in the country.txt.
> That didn't help. And in my case, I was monitoring the disk space and
> at no time did it reach 0%. So, I am not sure if that is the case. To
> remove the dependency on the number of countries, I even tried with
> the subjects.txt as the classification - that also did not help.
> I think this problem is due to the type of the data being processed,
> but what I am not sure of is what I need to change to get the data to
> be processed successfully.
>
> The experienced folks on Mahout will be able to tell us what is missing I
> guess.
>
> Thank you
> Gangadhar
>
> On Sun, Sep 19, 2010 at 8:06 AM, Joe Kumar <[email protected]> wrote:
> > Gangadhar,
> >
> > I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just
> have
> > 1 entry (spain) and used WikipediaDatasetCreatorDriver to create the
> > wikipediainput data set and then ran TrainClassifier and it worked. when
> I
> > ran TestClassifier as below, I got blank results in the output.
> >
> > $MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
> > org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d
> >  wikipediainput  -ng 3 -type bayes -source hdfs
> >
> > Summary
> > -------------------------------------------------------
> > Correctly Classified Instances          :          0         ?%
> > Incorrectly Classified Instances        :          0         ?%
> > Total Classified Instances              :          0
> >
> > =======================================================
> > Confusion Matrix
> > -------------------------------------------------------
> > a     <--Classified as
> > 0     |  0     a     = spain
> > Default Category: unknown: 1
> >
> > I am not sure if I am doing something wrong.. have to figure out why my
> o/p
> > is so blank.
> > I'll document these steps and mention about country.txt in the wiki.
> >
> > Question to all
> > Should we have 2 country.txt
> >
> >   1. country_full_list.txt - this is the existing list
> >   2. country_sample_list.txt - a list with 2 or 3 countries
> >
> > To get a flavor of the wikipedia bayes example, we can use
> > country_sample.txt. When new people want to just try out the example,
> they
> > can reference this txt file  as a parameter.
> > To run the example in a robust scalable infrastructure, we could use
> > country_full_list.txt.
> > any thots ?
> >
> > regards
> > Joe.
> >
> > On Sat, Sep 18, 2010 at 8:57 PM, Joe Kumar <[email protected]> wrote:
> >
> >> Gangadhar,
> >>
> >> After running TrainClassifier again, the map task just failed with the
> same
> >> exception and I am pretty sure it is an issue with disk space.
> >> As the map was progressing, I was monitoring my free disk space dropping
> >> from 81GB. It came down to 0 after almost 66% through the map task and
> then
> >> the exception happened. After the exception, another map task was
> resuming
> >> at 33% and I got close to 15GB free space (i guess the first map task
> freed
> >> up some space) and I am sure they would drop down to zero again and
> throw
> >> the same exception.
> >> I am going to modify the country.txt to just 1 country and recreate
> >> wikipediainput and run TrainClassifier. Will let you know how it goes..
> >>
> >> Do we have any benchmarks / system requirements for running this example
> ?
> >> Has anyone else had success running this example anytime. Would
> appreciate
> >> your inputs / thots.
> >>
> >> Should we look at tuning the code for handling these situations ? Any
> quick
> >> suggestions on where to start looking at ?
> >>
> >> regards,
> >> Joe.
> >>
> >>
> >>
> >>
> >
>

Reply via email to