On Thu, 2015-11-12 at 15:43 +0000, Russ, Daniel (NIH/CIT) [E] wrote: > 1) I use the old sourceforge models. I find that the source of error > in my analysis are usually not do to mistakes in sentence detection or > POS tagging. I don’t have the annotated data or the time/money to > build custom models. Yes, the text I analyze is quite different than > the (WSJ? or what corpus was used to build the models), but it is good > enough.
That is interesting, wasn't aware of that those are still useful. It really depends on the component as well, I was mostly thinking about the name finder models when I wrote that. Do you only use the Sentence Detector, Tokenizer and POS tagger? You could use OntoNotes (almost for free) to train models. Maybe we should look into distributing models trained on OntoNotes. Jörn
signature.asc
Description: This is a digitally signed message part