RE: ClearNLP POSTagger

Chen, Pei Mon, 08 Apr 2013 12:21:47 -0700

Okay, 
I'll commit the ClearPOSTagger and make it available in the ctakes-pos-tagger 
component, but leave everything as they currently are (currently default to 
OpenNLP).
We can always switch one or the other in the future (when there is a fair 
comparison/benchmark).


Note: I think there is a pretty significant speed improvement in the 
ClearPOSTagger as well.

> -----Original Message-----
> From: Lee Becker [mailto:[email protected]]
> Sent: Monday, April 08, 2013 2:29 PM
> To: [email protected]
> Subject: Re: ClearNLP POSTagger
> 
> On Mon, Apr 8, 2013 at 12:04 PM, Steven Bethard
> <[email protected]
> > wrote:
> 
> > > While working on the Dependency Parser/SRL labeler,  we also have a
> > POSTagger from ClearNLP.  It is fairly simple and I have the code
> > ready (also trained on the same data as the dep parser- MiPaq/SHARP)
> > to be checked-in.  What does the folks think:
> > > We can include both Analysis Engines in the ctakes-pos-tagger project.
> >  But should we leave the current OpenNLP in the default pipeline or
> > default to the latest?
> >
> > My vote would be to default for whatever has the best performance.
> > Presumably the ClearNLP one?
> >
> > > "The ClearNLP POS tagger shows more robust results on unknown words
> > > by
> > generalizing lexical features.
> >
> > Looking at the paper, ClearNLP POS tagger is not compared directly to
> > the cTAKES OpenNLP POS tagger, but they do outperform the Stanford
> > tagger trained on the same data, so it's probably a reasonable guess
> > that they're more accurate than the OpenNLP tagger.
> >
> > > It also uses AdaGrad for machine learning, which is a more advanced
> > learning algorithm than maximum entropy used by OpenNLP."
> >
> > My opinion is that we should never include a model in cTAKES just
> > because it has a "more advanced learning algorithm". "More advanced
> > learning algorithm" does not always translate into better performance.
> 
> 
> If my memory is serving me correctly, I think Jinho trained his parsers off of
> predicted POS tags to get eke out the extra performance.  The takeaway
> being that ClearNLP does better when you can use as much of its pipeline as
> possible.

RE: ClearNLP POSTagger

Reply via email to