Fixz!  Done. Svn up

https://issues.apache.org/jira/browse/MAHOUT-296

robin

On Thu, Feb 18, 2010 at 4:19 PM, Robin Anil <[email protected]> wrote:

> Yeah. It definitely shouldn't be. I will post a fix soon(I am at work right
> now). Meanwhile, You can see the test classifier code, and programmatically
> run the classifier.
> its as easy as  setting the params and instantiating a classifier context
> and send it files one by one.
>
> Robin
>
>
>
> On Thu, Feb 18, 2010 at 4:15 PM, Loek Cleophas 
> <[email protected]>wrote:
>
>> Thank you Robin. The stack trace I got:
>>
>> Exception in thread "main" java.lang.NullPointerException
>>        at
>> org.apache.mahout.classifier.ConfusionMatrix.getCount(ConfusionMatrix.java:100)
>>        at
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:117)
>>        at
>> org.apache.mahout.classifier.ConfusionMatrix.incrementCount(ConfusionMatrix.java:122)
>>        at
>> org.apache.mahout.classifier.ConfusionMatrix.addInstance(ConfusionMatrix.java:88)
>>        at
>> org.apache.mahout.classifier.ResultAnalyzer.addInstance(ResultAnalyzer.java:63)
>>        at
>> org.apache.mahout.classifier.bayes.TestClassifier.classifySequential(TestClassifier.java:289)
>>        at
>> org.apache.mahout.classifier.bayes.TestClassifier.main(TestClassifier.java:204)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> Command line was: bin/hadoop jar
>> ~/Downloads/mahout-0.2/examples/target/mahout-examples-0.2.job
>> org.apache.mahout.classifier.bayes.TestClassifier -m
>> docs-klg-n3-wordLevel-complementary -d
>> ~/Code/klg/indextrainingvalidation/docs-klg-mahout-validate -ng 3 -type
>> cbayes -source hdfs -method sequential
>>
>> It did read the model in correctly - and when I substitute a non-existing
>> input directory for the one with the non-category-named .txt file, it indeed
>> runs normally (classifying 0 instances).
>>
>> I presume it should be easy to reproduce - if not, let me know and I can
>> see whether I can give you our small test data set or some small subset of
>> it that I can reproduce it with.
>>
>> Regards,
>> Loek
>>
>>
>> On Feb 18, 2010, at 11:25, Robin Anil wrote:
>>
>>  I will look into this.
>>>
>>> On Thu, Feb 18, 2010 at 3:42 PM, Loek Cleophas <
>>> [email protected]>wrote:
>>>
>>>  Hi
>>>>
>>>> While playing around some more with the 20newsgroups example code for
>>>> the
>>>> Bayes classifiers, I ran into an oddity and a presumable bug:
>>>>
>>>> instead of using (parts of) the 20 newsgroups data set, which was split
>>>> nicely into one file per newsgroup, with the 'category, tab, tokens'
>>>> line
>>>> format, I generated such a file out of our company data set. What I did
>>>> though was generate 1 file to train, and 1 to test with - so both files
>>>> could have different lines having different categories, e.g.
>>>>
>>>> cars    Ferrari red ....
>>>> animals cow cat dog ....
>>>>
>>>> In training, this works fine.  In testing, it crashes TestClassifier
>>>> with a
>>>> null pointer exception. I presume that is because either the file name
>>>> does
>>>> not match category.txt for some category name, or because there's
>>>> multiple
>>>> categories being used inside the single file - but I also presume that
>>>> neither should crash the thing :) It also brings up the question: if the
>>>> line format in the data files has the category in there, then why are
>>>> the
>>>> file names relevant at all? Seems like redundancy to me. Shouldn't
>>>> TestClassifier merely take all .txt files in the input data directory
>>>> and
>>>> process their contents?
>>>>
>>>> Regards,
>>>> Loek
>>>>
>>>>
>>
>

Reply via email to