Gangadhar,

I modified $MAHOUT_HOME/examples/src/test/resources/country.txt to just have
1 entry (spain) and used WikipediaDatasetCreatorDriver to create the
wikipediainput data set and then ran TrainClassifier and it worked. when I
ran TestClassifier as below, I got blank results in the output.

$MAHOUT_HOME/examples/target/mahout-examples-0.4-SNAPSHOT.job
org.apache.mahout.classifier.bayes.TestClassifier -m wikipediamodel -d
 wikipediainput  -ng 3 -type bayes -source hdfs

Summary
-------------------------------------------------------
Correctly Classified Instances          :          0         ?%
Incorrectly Classified Instances        :          0         ?%
Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a     <--Classified as
0     |  0     a     = spain
Default Category: unknown: 1

I am not sure if I am doing something wrong.. have to figure out why my o/p
is so blank.
I'll document these steps and mention about country.txt in the wiki.

Question to all
Should we have 2 country.txt

   1. country_full_list.txt - this is the existing list
   2. country_sample_list.txt - a list with 2 or 3 countries

To get a flavor of the wikipedia bayes example, we can use
country_sample.txt. When new people want to just try out the example, they
can reference this txt file  as a parameter.
To run the example in a robust scalable infrastructure, we could use
country_full_list.txt.
any thots ?

regards
Joe.

On Sat, Sep 18, 2010 at 8:57 PM, Joe Kumar <[email protected]> wrote:

> Gangadhar,
>
> After running TrainClassifier again, the map task just failed with the same
> exception and I am pretty sure it is an issue with disk space.
> As the map was progressing, I was monitoring my free disk space dropping
> from 81GB. It came down to 0 after almost 66% through the map task and then
> the exception happened. After the exception, another map task was resuming
> at 33% and I got close to 15GB free space (i guess the first map task freed
> up some space) and I am sure they would drop down to zero again and throw
> the same exception.
> I am going to modify the country.txt to just 1 country and recreate
> wikipediainput and run TrainClassifier. Will let you know how it goes..
>
> Do we have any benchmarks / system requirements for running this example ?
> Has anyone else had success running this example anytime. Would appreciate
> your inputs / thots.
>
> Should we look at tuning the code for handling these situations ? Any quick
> suggestions on where to start looking at ?
>
> regards,
> Joe.
>
>
>
>

Reply via email to