Yes, I just finished implementing the confusion matrix report, just like
the one I did for the POS Tagger. I will commit it today.

I could not test it properly with Leipzig corpus. For some reason to Doccat
never fails with this corpus!
To effectively test it I used the 20news corpus.


2014-04-10 19:37 GMT-03:00 Jörn Kottmann <kottm...@gmail.com>:

> I thought it should be done similar to the way pos tags are measured when
> I implemented that.
>
> A confusion matrix might also be helpful to see which categories are more
> difficult to classify for the system.
>
> Jörn
>
>
> On 04/10/2014 03:00 PM, William Colen wrote:
>
>> Actually, since we always add a tag to each document, accuracy makes
>> sense.
>> We could implement F-1 for the individual categories.
>>
>> 2014-04-09 17:23 GMT-03:00 William Colen <william.co...@gmail.com>:
>>
>>  Hello,
>>>
>>> I was checking if there is any open issue related to Doccat, and I found
>>> this one -
>>>
>>> OPENNLP-81: Add a cli tool for the doccat evaluation support
>>>
>>> I noticed that there is already a class
>>> named DocumentCategorizerEvaluator, which is not used anywhere
>>> internally.
>>> This is evaluating performance in terms of accuracy, but I believe it
>>> would
>>> be better do do it in terms of F-Measuare.
>>>
>>> Any thoughts?
>>>
>>> As we are working in a major version, I think it would be OK to change
>>> it.
>>>
>>>
>>> Thank you,
>>> William
>>>
>>>
>

Reply via email to