On Tue, Jan 19, 2010 at 1:02 PM, Loek Cleophas <[email protected]>wrote:
> Are you sure about it reading from local dir? Yes, absolutely > Note that I pass -source hdfs to the TestClassifier, and that when I try to > run it instead with a full local path i.e. as: > That source flag in TestClassifier is only for the model (it can be in hdfs or hbase) In sequential mode. the test files are read of the local disk. Where as in mapreduce mode the test files are read off the hdfs > bin/hadoop jar > ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job > org.apache.mahout.classifier.bayes.TrainClassifier -i > ~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse > -o 8newsmodel-0.2 -ng 3 -type bayes -source hdfs > Like i said, Trainer is completely map/reduce it reads of the hdfs > > I get the following exception, which seems to imply it is not reading the > input from a local dir...: > > Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: > Input path does not exist: > hdfs://localhost:9000/Users/loekcleophas/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse > > > > On Jan 19, 2010, at 08:12, Robin Anil wrote: > > Is it reading the directory correctly ? Note, 8newsinput is read from >> local >> dir. >> >> >> >> >> >> On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas >> <[email protected]>wrote: >> >> Hi >>> >>> I've recently started working with Mahout. At first, I tried the trunk, >>> which I got to compile (both from within Eclipse with a Maven plugin, and >>> command line), but which apparently is in a state of flux regarding >>> building >>> and running the examples (?). >>> >>> I tried running the Twentynewsgroups classification example, after >>> copying >>> the relevant Maven file to the examples directory, as suggested on the >>> mailing list some time ago. I could get the example's data set from >>> wikipedia, could get it processed into input data located on the >>> single-node/local hdfs, and could get a model trained and output to that >>> hdfs. However, the example class TestClassifierto test with the trained >>> model didn't work for me, neither in mapreduce nor in sequential mode. In >>> the mapreduce case, and even with quite high JVM maximum heap sizes (I >>> tried >>> 2048), I get heapspace out of memory errors / object configuration >>> errors. >>> In the sequential case, I seemingly get 0 items classified, see output >>> below. (Note that I reduced the data set to just 8 instead of 20 >>> newsgroups, >>> thinking the data size might have something to do with the problem.) >>> >>> I also tried release 0.2, which I got to compile and for which I got the >>> example running more easily, but still with the same errors when testing >>> with the trained model. Any ideas what might be going wrong, or what I >>> might >>> be doing wrong? >>> >>> Kind regards, >>> Loek Cleophas >>> >>> >>> Output of TestClassifier: >>> >>> bin/hadoop jar >>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job >>> org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d >>> 8newsInput -ng 3 -type bayes -source hdfs -method sequential >>> >>> <... reading all the feature weights ...> >>> >>> 10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000 feature >>> weights >>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>> >>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_k/part-00000 >>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>> >>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_kSigma_j/part-00000 >>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613 >>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>> >>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-thetaNormalizer/part-00000 >>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>> >>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-tfIdf/trainer-tfIdf/part-00000 >>> comp.windows.x -4443829.798557077 7727496.583973498 -0.5750671967650419 >>> comp.graphics -3252365.124498224 7727496.583973498 -0.4208821174044246 >>> soc.religion.christian -5106741.34456479 7727496.583973498 >>> -0.6608532645819548 >>> alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907 >>> misc.forsale -2276588.3662840202 7727496.583973498 -0.2946087832643716 >>> comp.sys.mac.hardware -2445489.855812473 7727496.583973498 >>> -0.31646598988918556 >>> comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0 >>> comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498 >>> -0.3478030123750332 >>> 10/01/13 10:23:17 INFO bayes.TestClassifier: >>> nCalls = 0; >>> sumTime = 0.0s; >>> minTime = 0.0ms; >>> maxTime = 0.0ms; >>> meanTime = 0.0ms; >>> stdDevTime = 0.0ms; >>> 10/01/13 10:23:18 INFO bayes.TestClassifier: >>> ======================================================= >>> Summary >>> ------------------------------------------------------- >>> Correctly Classified Instances : 0 ?% >>> Incorrectly Classified Instances : 0 ?% >>> Total Classified Instances : 0 >>> >>> ======================================================= >>> Confusion Matrix >>> ------------------------------------------------------- >>> a b c d e f g h >>> <--Classified as >>> 0 0 0 0 0 0 0 0 | 0 >>> a = comp.windows.x >>> 0 0 0 0 0 0 0 0 | 0 >>> b = comp.graphics >>> 0 0 0 0 0 0 0 0 | 0 >>> c = soc.religion.christian >>> 0 0 0 0 0 0 0 0 | 0 >>> d = alt.atheism >>> 0 0 0 0 0 0 0 0 | 0 >>> e = misc.forsale >>> 0 0 0 0 0 0 0 0 | 0 >>> f = comp.sys.mac.hardware >>> 0 0 0 0 0 0 0 0 | 0 >>> g = comp.os.ms-windows.misc >>> 0 0 0 0 0 0 0 0 | 0 >>> h = comp.sys.ibm.pc.hardware >>> Default Category: unknown: 8 >>> >>> >>> >
