Hi again

My apologies: the results in my previous e-mail were a result of inadvertently running *TrainClassifier* with the -i parameter using a relative local path vs. one on the DFS. Naturally, since my problem was with *TestClassifier*, I should've run that with the adapted -i parameter value. (Must have been the lack of morning coffee.)

I have now rerun TrainClassifier to reconstruct the model, and run TestClassifier with:

bin/hadoop jar ~/Downloads/mahout-0.2/examples/target/mahout- examples-0.2.job org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d ~/Code/My_Eclipse_Workspace/apache-mahout/examples/ work/20news-18828-collapse -ng 3 -type bayes -source hdfs -method sequential

That solved the problem. Thanks a lot for that useful remark about the input for the TestClassifier needing to come from the local file system. I'll now go and sit in the 'feeling silly' corner :)

Best wishes,
Loek


On Jan 19, 2010, at 08:45, Robin Anil wrote:

On Tue, Jan 19, 2010 at 1:02 PM, Loek Cleophas <[email protected] >wrote:

Are you sure about it reading from local dir?

Yes, absolutely

Note that I pass -source hdfs to the TestClassifier, and that when I try to
run it instead with a full local path i.e. as:

That source flag in TestClassifier is only for the model (it can be in hdfs
or hbase)

In sequential mode. the test files are read of the local disk. Where as in
mapreduce mode the test files are read off the hdfs


bin/hadoop jar
~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
org.apache.mahout.classifier.bayes.TrainClassifier -i
~/Code/My_Eclipse_Workspace/apache-mahout/examples/work/ 20news-18828-collapse
-o 8newsmodel-0.2 -ng 3 -type bayes -source hdfs

Like i said, Trainer is completely map/reduce it reads of the hdfs



I get the following exception, which seems to imply it is not reading the
input from a local dir...:

Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
hdfs://localhost:9000/Users/loekcleophas/Code/My_Eclipse_Workspace/ apache-mahout/examples/work/20news-18828-collapse



On Jan 19, 2010, at 08:12, Robin Anil wrote:

Is it reading the directory correctly ? Note, 8newsinput is read from
local
dir.





On Tue, Jan 19, 2010 at 12:39 PM, Loek Cleophas
<[email protected]>wrote:

Hi

I've recently started working with Mahout. At first, I tried the trunk, which I got to compile (both from within Eclipse with a Maven plugin, and
command line), but which apparently is in a state of flux regarding
building
and running the examples (?).

I tried running the Twentynewsgroups classification example, after
copying
the relevant Maven file to the examples directory, as suggested on the
mailing list some time ago. I could get the example's data set from
wikipedia, could get it processed into input data located on the
single-node/local hdfs, and could get a model trained and output to that hdfs. However, the example class TestClassifierto test with the trained model didn't work for me, neither in mapreduce nor in sequential mode. In the mapreduce case, and even with quite high JVM maximum heap sizes (I
tried
2048), I get heapspace out of memory errors / object configuration
errors.
In the sequential case, I seemingly get 0 items classified, see output
below. (Note that I reduced the data set to just 8 instead of 20
newsgroups,
thinking the data size might have something to do with the problem.)

I also tried release 0.2, which I got to compile and for which I got the example running more easily, but still with the same errors when testing with the trained model. Any ideas what might be going wrong, or what I
might
be doing wrong?

Kind regards,
Loek Cleophas


Output of TestClassifier:

bin/hadoop jar
~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
org.apache.mahout.classifier.bayes.TestClassifier -m 8newsmodel-0.2 -d
8newsInput -ng 3 -type bayes -source hdfs -method sequential

<... reading all the feature weights ...>

10/01/13 10:22:08 INFO io.SequenceFileModelReader: Read 1950000 feature
weights
10/01/13 10:22:11 INFO io.SequenceFileModelReader:

hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- weights/Sigma_k/part-00000
10/01/13 10:22:11 INFO io.SequenceFileModelReader:

hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- weights/Sigma_kSigma_j/part-00000 10/01/13 10:22:11 INFO io.SequenceFileModelReader: 420716.6056712613
10/01/13 10:22:11 INFO io.SequenceFileModelReader:

hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- thetaNormalizer/part-00000
10/01/13 10:22:11 INFO io.SequenceFileModelReader:

hdfs://localhost:9000/user/loekcleophas/8newsmodel-0.2/trainer- tfIdf/trainer-tfIdf/part-00000 comp.windows.x -4443829.798557077 7727496.583973498 -0.5750671967650419 comp.graphics -3252365.124498224 7727496.583973498 -0.4208821174044246
soc.religion.christian -5106741.34456479 7727496.583973498
-0.6608532645819548
alt.atheism -3447983.6168798 7727496.583973498 -0.44619671835646907
misc.forsale -2276588.3662840202 7727496.583973498 -0.2946087832643716
comp.sys.mac.hardware -2445489.855812473 7727496.583973498
-0.31646598988918556
comp.os.ms-windows.misc -7727496.583973498 7727496.583973498 -1.0
comp.sys.ibm.pc.hardware -2687646.590023761 7727496.583973498
-0.3478030123750332
10/01/13 10:23:17 INFO bayes.TestClassifier:
nCalls = 0;
sumTime = 0.0s;
minTime = 0.0ms;
maxTime = 0.0ms;
meanTime = 0.0ms;
stdDevTime = 0.0ms;
10/01/13 10:23:18 INFO bayes.TestClassifier:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :          0             ?%
Incorrectly Classified Instances        :          0             ?%
Total Classified Instances              :          0

=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h
<--Classified as
0 0 0 0 0 0 0 0 | 0
 a     = comp.windows.x
0 0 0 0 0 0 0 0 | 0
 b     = comp.graphics
0 0 0 0 0 0 0 0 | 0
 c     = soc.religion.christian
0 0 0 0 0 0 0 0 | 0
 d     = alt.atheism
0 0 0 0 0 0 0 0 | 0
 e     = misc.forsale
0 0 0 0 0 0 0 0 | 0
 f     = comp.sys.mac.hardware
0 0 0 0 0 0 0 0 | 0
 g     = comp.os.ms-windows.misc
0 0 0 0 0 0 0 0 | 0
 h     = comp.sys.ibm.pc.hardware
Default Category: unknown: 8





Reply via email to