Is it reading the directory correctly ? Note, 8newsinput is read from local
dir.





On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas
<[email protected]>wrote:

> Hi
>
> I've recently started working with Mahout. At first, I tried the trunk,
> which I got to compile (both from within Eclipse with a Maven plugin, and
> command line), but which apparently is in a state of flux regarding building
> and running the examples (?).
>
> I tried running the Twentynewsgroups classification example, after copying
> the relevant Maven file to the examples directory, as suggested on the
> mailing list some time ago. I could get the example's data set from
> wikipedia, could get it processed into input data located on the
> single-node/local hdfs, and could get a model trained and output to that
> hdfs. However, the example class TestClassifierto test with the trained
> model didn't work for me, neither in mapreduce nor in sequential mode. In
> the mapreduce case, and even with quite high JVM maximum heap sizes (I tried
> 2048), I get heapspace out of memory errors / object configuration errors.
> In the sequential case, I seemingly get 0 items classified, see output
> below. (Note that I reduced the data set to just 8 instead of 20 newsgroups,
> thinking the data size might have something to do with the problem.)
>
> I also tried release 0.2, which I got to compile and for which I got the
> example running more easily, but still with the same errors when testing
> with the trained model. Any ideas what might be going wrong, or what I might
> be doing wrong?
>
> Kind regards,
> Loek Cleophas
>
>
> Output of TestClassifier:
>
> bin/hadoop jar
> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
> org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d
> 8newsInput -ng 3 -type bayes -source hdfs -method sequential
>
> <... reading all the feature weights ...>
>
> 10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000 feature
> weights
> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_k/part-00000
> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_kSigma_j/part-00000
> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613
> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-thetaNormalizer/part-00000
> 10/01/13 10:22:11 INFO io.SequenceFileModelReader:
> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-tfIdf/trainer-tfIdf/part-00000
> comp.windows.x -4443829.798557077 7727496.583973498 -0.5750671967650419
> comp.graphics -3252365.124498224 7727496.583973498 -0.4208821174044246
> soc.religion.christian -5106741.34456479 7727496.583973498
> -0.6608532645819548
> alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907
> misc.forsale -2276588.3662840202 7727496.583973498 -0.2946087832643716
> comp.sys.mac.hardware -2445489.855812473 7727496.583973498
> -0.31646598988918556
> comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0
> comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498
> -0.3478030123750332
> 10/01/13 10:23:17 INFO bayes.TestClassifier:
> nCalls = 0;
> sumTime = 0.0s;
> minTime = 0.0ms;
> maxTime = 0.0ms;
> meanTime = 0.0ms;
> stdDevTime = 0.0ms;
> 10/01/13 10:23:18 INFO bayes.TestClassifier:
> =======================================================
> Summary
> -------------------------------------------------------
> Correctly Classified Instances          :          0             ?%
> Incorrectly Classified Instances        :          0             ?%
> Total Classified Instances              :          0
>
> =======================================================
> Confusion Matrix
> -------------------------------------------------------
> a       b       c       d       e       f       g       h
> <--Classified as
> 0       0       0       0       0       0       0       0        |  0
>     a     = comp.windows.x
> 0       0       0       0       0       0       0       0        |  0
>     b     = comp.graphics
> 0       0       0       0       0       0       0       0        |  0
>     c     = soc.religion.christian
> 0       0       0       0       0       0       0       0        |  0
>     d     = alt.atheism
> 0       0       0       0       0       0       0       0        |  0
>     e     = misc.forsale
> 0       0       0       0       0       0       0       0        |  0
>     f     = comp.sys.mac.hardware
> 0       0       0       0       0       0       0       0        |  0
>     g     = comp.os.ms-windows.misc
> 0       0       0       0       0       0       0       0        |  0
>     h     = comp.sys.ibm.pc.hardware
> Default Category: unknown: 8
>
>

Reply via email to