On Apr 8, 2013, at 10:15 AM, "Chen, Pei" <[email protected]> wrote:
> While working on the Dependency Parser/SRL labeler,  we also have a POSTagger 
> from ClearNLP.  It is fairly simple and I have the code ready (also trained 
> on the same data as the dep parser- MiPaq/SHARP) to be checked-in.  What does 
> the folks think:
> We can include both Analysis Engines in the ctakes-pos-tagger project.  But 
> should we leave the current OpenNLP in the default pipeline or default to the 
> latest?

My vote would be to default for whatever has the best performance. Presumably 
the ClearNLP one?

> "The ClearNLP POS tagger shows more robust results on unknown words by 
> generalizing lexical features.

Looking at the paper, ClearNLP POS tagger is not compared directly to the 
cTAKES OpenNLP POS tagger, but they do outperform the Stanford tagger trained 
on the same data, so it's probably a reasonable guess that they're more 
accurate than the OpenNLP tagger.

> It also uses AdaGrad for machine learning, which is a more advanced learning 
> algorithm than maximum entropy used by OpenNLP."

My opinion is that we should never include a model in cTAKES just because it 
has a "more advanced learning algorithm". "More advanced learning algorithm" 
does not always translate into better performance.

Steve

Reply via email to