On Tue, Jan 19, 2010 at 2:30 PM, Loek Cleophas <[email protected]>wrote:
> Hi again > > My apologies: the results in my previous e-mail were a result of > inadvertently running *TrainClassifier* with the -i parameter using a > relative local path vs. one on the DFS. Naturally, since my problem was with > *TestClassifier*, I should've run that with the adapted -i parameter value. > (Must have been the lack of morning coffee.) > > I have now rerun TrainClassifier to reconstruct the model, and run > TestClassifier with: > > bin/hadoop jar > ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job > org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d > ~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse > -ng 3 -type bayes -source hdfs -method sequential > > That solved the problem. Thanks a lot for that useful remark about the > input for the TestClassifier needing to come from the local file system. > I'll now go and sit in the 'feeling silly' corner :) > There is nothing silly about it. I missed documenting that particular info. So I guess I am in the silly corner :P When you run it against some dataset other than 20 newsgroups, please tell us how it goes, so that we can take that feedback into improving it. Regards Robin > > Best wishes, > Loek > > > > On Jan 19, 2010, at 08:45, Robin Anil wrote: > > On Tue, Jan 19, 2010 at 1:02 PM, Loek Cleophas <[email protected] >> >wrote: >> >> Are you sure about it reading from local dir? >>> >> >> Yes, absolutely >> >> Note that I pass -source hdfs to the TestClassifier, and that when I try >>> to >>> run it instead with a full local path i.e. as: >>> >>> That source flag in TestClassifier is only for the model (it can be in >> hdfs >> or hbase) >> >> In sequential mode. the test files are read of the local disk. Where as in >> mapreduce mode the test files are read off the hdfs >> >> >> bin/hadoop jar >>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job >>> org.apache.mahout.classifier.bayes.TrainClassifier -i >>> >>> ~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse >>> -o 8newsmodel-0.2 -ng 3 -type bayes -source hdfs >>> >>> Like i said, Trainer is completely map/reduce it reads of the hdfs >> >> >> >>> I get the following exception, which seems to imply it is not reading the >>> input from a local dir...: >>> >>> Exception in thread "main" >>> org.apache.hadoop.mapred.InvalidInputException: >>> Input path does not exist: >>> >>> hdfs://localhost:9000/Users/loekcleophas/Code/My_Eclipse_Workspace/apache-mahout/examples/work/20news-18828-collapse >>> >>> >>> >>> On Jan 19, 2010, at 08:12, Robin Anil wrote: >>> >>> Is it reading the directory correctly ? Note, 8newsinput is read from >>> >>>> local >>>> dir. >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas >>>> <[email protected]>wrote: >>>> >>>> Hi >>>> >>>>> >>>>> I've recently started working with Mahout. At first, I tried the trunk, >>>>> which I got to compile (both from within Eclipse with a Maven plugin, >>>>> and >>>>> command line), but which apparently is in a state of flux regarding >>>>> building >>>>> and running the examples (?). >>>>> >>>>> I tried running the Twentynewsgroups classification example, after >>>>> copying >>>>> the relevant Maven file to the examples directory, as suggested on the >>>>> mailing list some time ago. I could get the example's data set from >>>>> wikipedia, could get it processed into input data located on the >>>>> single-node/local hdfs, and could get a model trained and output to >>>>> that >>>>> hdfs. However, the example class TestClassifierto test with the trained >>>>> model didn't work for me, neither in mapreduce nor in sequential mode. >>>>> In >>>>> the mapreduce case, and even with quite high JVM maximum heap sizes (I >>>>> tried >>>>> 2048), I get heapspace out of memory errors / object configuration >>>>> errors. >>>>> In the sequential case, I seemingly get 0 items classified, see output >>>>> below. (Note that I reduced the data set to just 8 instead of 20 >>>>> newsgroups, >>>>> thinking the data size might have something to do with the problem.) >>>>> >>>>> I also tried release 0.2, which I got to compile and for which I got >>>>> the >>>>> example running more easily, but still with the same errors when >>>>> testing >>>>> with the trained model. Any ideas what might be going wrong, or what I >>>>> might >>>>> be doing wrong? >>>>> >>>>> Kind regards, >>>>> Loek Cleophas >>>>> >>>>> >>>>> Output of TestClassifier: >>>>> >>>>> bin/hadoop jar >>>>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job >>>>> org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d >>>>> 8newsInput -ng 3 -type bayes -source hdfs -method sequential >>>>> >>>>> <... reading all the feature weights ...> >>>>> >>>>> 10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000 feature >>>>> weights >>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>>>> >>>>> >>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_k/part-00000 >>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>>>> >>>>> >>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-weights/Sigma_kSigma_j/part-00000 >>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613 >>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>>>> >>>>> >>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-thetaNormalizer/part-00000 >>>>> 10/01/13 10:22:11 INFO io.SequenceFileModelReader: >>>>> >>>>> >>>>> hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer-tfIdf/trainer-tfIdf/part-00000 >>>>> comp.windows.x -4443829.798557077 7727496.583973498 -0.5750671967650419 >>>>> comp.graphics -3252365.124498224 7727496.583973498 -0.4208821174044246 >>>>> soc.religion.christian -5106741.34456479 7727496.583973498 >>>>> -0.6608532645819548 >>>>> alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907 >>>>> misc.forsale -2276588.3662840202 7727496.583973498 -0.2946087832643716 >>>>> comp.sys.mac.hardware -2445489.855812473 7727496.583973498 >>>>> -0.31646598988918556 >>>>> comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0 >>>>> comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498 >>>>> -0.3478030123750332 >>>>> 10/01/13 10:23:17 INFO bayes.TestClassifier: >>>>> nCalls = 0; >>>>> sumTime = 0.0s; >>>>> minTime = 0.0ms; >>>>> maxTime = 0.0ms; >>>>> meanTime = 0.0ms; >>>>> stdDevTime = 0.0ms; >>>>> 10/01/13 10:23:18 INFO bayes.TestClassifier: >>>>> ======================================================= >>>>> Summary >>>>> ------------------------------------------------------- >>>>> Correctly Classified Instances : 0 ?% >>>>> Incorrectly Classified Instances : 0 ?% >>>>> Total Classified Instances : 0 >>>>> >>>>> ======================================================= >>>>> Confusion Matrix >>>>> ------------------------------------------------------- >>>>> a b c d e f g h >>>>> <--Classified as >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> a = comp.windows.x >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> b = comp.graphics >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> c = soc.religion.christian >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> d = alt.atheism >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> e = misc.forsale >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> f = comp.sys.mac.hardware >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> g = comp.os.ms-windows.misc >>>>> 0 0 0 0 0 0 0 0 | 0 >>>>> h = comp.sys.ibm.pc.hardware >>>>> Default Category: unknown: 8 >>>>> >>>>> >>>>> >>>>> >>> > -- ------ Robin Anil Blog: http://techdigger.wordpress.com ------- Try out Swipeball for iPhone Video: http://www.youtube.com/watch?v=3hvEbWHciwU iTunes: http://itunes.com/apps/swipeball
