Thank you for the feedback. I don't know if the report I have in mind for the POS Tagger would apply for the DocCat. I attached an example output to the Jira: https://issues.apache.org/jira/browse/OPENNLP-449
On Sun, Feb 26, 2012 at 8:36 PM, Jörn Kottmann <[email protected]> wrote: > +1 also needed for doccat. > > Maybe it can be created by a class which could also > be used for doccat. > > Jörn > > > On 02/26/2012 03:13 AM, Jason Baldridge wrote: > >> +1 Fine-grained error analysis FTW! >> >> On Sat, Feb 25, 2012 at 4:57 PM, [email protected]< >> [email protected]> wrote: >> >> Hi, >>> >>> I implemented a new EvaluationMonitor for the POS Tagger. It generates >>> a confusion >>> matrix<http://en.wikipedia.**org/wiki/Confusion_matrix<http://en.wikipedia.org/wiki/Confusion_matrix>> >>> for each token that >>> was not tagged properly. >>> >>> Example output (Portuguese): >>> >>> ... >>> Accuracy for [que]: 91,34% >>> 1316 ocurrencies. Confusion matrix (line: reference; column: predicted): >>> | conj-s | pron-indp | adv | pron-det || % Accu || >>> conj-s |> 537<| 40 | 0 | 0 || 93,07% || >>> pron-indp | 59 |> 661<| 0 | 0 || 91,81% || >>> adv | 2 | 12 |> 4<| 0 || 22,22% || >>> pron-det | 0 | 1 | 0 |> 0<|| 0% || >>> >>> Accuracy for [o]: 98,48% >>> 3949 ocurrencies. Confusion matrix (line: reference; column: predicted): >>> | art | pron-det | pron-pers | , || % Accu || >>> art |> 3857<| 4 | 0 | 1 || 99,87% || >>> pron-det | 36 |> 24<| 0 | 0 || 40% || >>> pron-pers | 19 | 0 |> 8<| 0 || 29,63% || >>> , | 0 | 0 | 0 |> 0<|| 0% || >>> >>> Accuracy for [a]: 96% >>> 4395 ocurrencies. Confusion matrix (line: reference; column: predicted): >>> | art | prp | pron-pers | pron-det || % Accu || >>> art |> 3291<| 54 | 0 | 0 || 98,39% || >>> prp | 107 |> 922<| 0 | 0 || 89,6% || >>> pron-pers | 4 | 0 |> 4<| 0 || 50% || >>> pron-det | 11 | 0 | 0 |> 2<|| 15,38% || >>> ... >>> >>> Do you think it is interesting to make this report available? >>> I would add it to the CLI and it would be activated by an new argument >>> that >>> pass in an output file for the report. >>> >>> Thank you, >>> William >>> >>> >> >> >
