Fixz! Done. Svn up https://issues.apache.org/jira/browse/MAHOUT-296
robin On Thu, Feb 18, 2010 at 4:19 PM, Robin Anil <[email protected]> wrote: > Yeah. It definitely shouldn't be. I will post a fix soon(I am at work right > now). Meanwhile, You can see the test classifier code, and programmatically > run the classifier. > its as easy as setting the params and instantiating a classifier context > and send it files one by one. > > Robin > > > > On Thu, Feb 18, 2010 at 4:15 PM, Loek Cleophas > <[email protected]>wrote: > >> Thank you Robin. The stack trace I got: >> >> Exception in thread "main" java.lang.NullPointerException >> at >> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:100) >> at >> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117) >> at >> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:122) >> at >> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:88) >> at >> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:63) >> at >> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:289) >> at >> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:204) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> >> Command line was: bin/hadoop jar >> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job >> org.apache.mahout.classifier.bayes.TestClassifier -m >> docs-klg-n3-wordLevel-complementary -d >> ~/Code/klg/indextrainingvalidation/docs-klg-mahout-validate -ng 3 -type >> cbayes -source hdfs -method sequential >> >> It did read the model in correctly - and when I substitute a non-existing >> input directory for the one with the non-category-named .txt file, it indeed >> runs normally (classifying 0 instances). >> >> I presume it should be easy to reproduce - if not, let me know and I can >> see whether I can give you our small test data set or some small subset of >> it that I can reproduce it with. >> >> Regards, >> Loek >> >> >> On Feb 18, 2010, at 11:25, Robin Anil wrote: >> >> I will look into this. >>> >>> On Thu, Feb 18, 2010 at 3:42 PM, Loek Cleophas < >>> [email protected]>wrote: >>> >>> Hi >>>> >>>> While playing around some more with the 20newsgroups example code for >>>> the >>>> Bayes classifiers, I ran into an oddity and a presumable bug: >>>> >>>> instead of using (parts of) the 20 newsgroups data set, which was split >>>> nicely into one file per newsgroup, with the 'category, tab, tokens' >>>> line >>>> format, I generated such a file out of our company data set. What I did >>>> though was generate 1 file to train, and 1 to test with - so both files >>>> could have different lines having different categories, e.g. >>>> >>>> cars Ferrari red .... >>>> animals cow cat dog .... >>>> >>>> In training, this works fine. In testing, it crashes TestClassifier >>>> with a >>>> null pointer exception. I presume that is because either the file name >>>> does >>>> not match category.txt for some category name, or because there's >>>> multiple >>>> categories being used inside the single file - but I also presume that >>>> neither should crash the thing :) It also brings up the question: if the >>>> line format in the data files has the category in there, then why are >>>> the >>>> file names relevant at all? Seems like redundancy to me. Shouldn't >>>> TestClassifier merely take all .txt files in the input data directory >>>> and >>>> process their contents? >>>> >>>> Regards, >>>> Loek >>>> >>>> >> >
