Is it reading the directory correctly ? Note, 8newsinput is read from local dir.
On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas <[email protected]>wrote: > Hi > > I've recently started working with Mahout. At first, I tried the trunk, > which I got to compile (both from within Eclipse with a Maven plugin, and > command line), but which apparently is in a state of flux regarding building > and running the examples (?). > > I tried running the Twentynewsgroups classification example, after copying > the relevant Maven file to the examples directory, as suggested on the > mailing list some time ago. I could get the example's data set from > wikipedia, could get it processed into input data located on the > single-node/local hdfs, and could get a model trained and output to that > hdfs. However, the example class TestClassifierto test with the trained > model didn't work for me, neither in mapreduce nor in sequential mode. In > the mapreduce case, and even with quite high JVM maximum heap sizes (I tried > 2048), I get heapspace out of memory errors / object configuration errors. > In the sequential case, I seemingly get 0 items classified, see output > below. (Note that I reduced the data set to just 8 instead of 20 newsgroups, > thinking the data size might have something to do with the problem.) > > I also tried release 0.2, which I got to compile and for which I got the > example running more easily, but still with the same errors when testing > with the trained model. Any ideas what might be going wrong, or what I might > be doing wrong? > > Kind regards, > Loek Cleophas > > > Output of TestClassifier: > > bin/hadoop jar > ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job > org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d > 8newsInput -ng 3 -type bayes -source hdfs -method sequential > > <... reading all the feature weights ...> > > 10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000 feature > weights > 10/01/13 10:22:11 INFO io.SequenceFileModelReader: > hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_k/part-00000 > 10/01/13 10:22:11 INFO io.SequenceFileModelReader: > hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_kSigma_j/part-00000 > 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613 > 10/01/13 10:22:11 INFO io.SequenceFileModelReader: > hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-thetaNormalizer/part-00000 > 10/01/13 10:22:11 INFO io.SequenceFileModelReader: > hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-tfIdf/trainer-tfIdf/part-00000 > comp.windows.x -4443829.798557077 7727496.583973498 -0.5750671967650419 > comp.graphics -3252365.124498224 7727496.583973498 -0.4208821174044246 > soc.religion.christian -5106741.34456479 7727496.583973498 > -0.6608532645819548 > alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907 > misc.forsale -2276588.3662840202 7727496.583973498 -0.2946087832643716 > comp.sys.mac.hardware -2445489.855812473 7727496.583973498 > -0.31646598988918556 > comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0 > comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498 > -0.3478030123750332 > 10/01/13 10:23:17 INFO bayes.TestClassifier: > nCalls = 0; > sumTime = 0.0s; > minTime = 0.0ms; > maxTime = 0.0ms; > meanTime = 0.0ms; > stdDevTime = 0.0ms; > 10/01/13 10:23:18 INFO bayes.TestClassifier: > ======================================================= > Summary > ------------------------------------------------------- > Correctly Classified Instances : 0 ?% > Incorrectly Classified Instances : 0 ?% > Total Classified Instances : 0 > > ======================================================= > Confusion Matrix > ------------------------------------------------------- > a b c d e f g h > <--Classified as > 0 0 0 0 0 0 0 0 | 0 > a = comp.windows.x > 0 0 0 0 0 0 0 0 | 0 > b = comp.graphics > 0 0 0 0 0 0 0 0 | 0 > c = soc.religion.christian > 0 0 0 0 0 0 0 0 | 0 > d = alt.atheism > 0 0 0 0 0 0 0 0 | 0 > e = misc.forsale > 0 0 0 0 0 0 0 0 | 0 > f = comp.sys.mac.hardware > 0 0 0 0 0 0 0 0 | 0 > g = comp.os.ms-windows.misc > 0 0 0 0 0 0 0 0 | 0 > h = comp.sys.ibm.pc.hardware > Default Category: unknown: 8 > >
