FYI: This has been done in trunk in r. 1466216 https://issues.apache.org/jira/browse/CTAKES-186 If you would like to try it out or run some benchmarks before we decide if we should make the default pipeline use this, just uncomment the below in your Aggregate Descriptors.
<delegateAnalysisEngine key="ClearPOSTagger"> <import location="../../../ctakes-pos-tagger/desc/ClearNLPPOSTagger.xml"/> </delegateAnalysisEngine> <node>ClearPOSTagger</node> > -----Original Message----- > From: Chen, Pei [mailto:[email protected]] > Sent: Monday, April 08, 2013 5:14 PM > To: [email protected] > Subject: RE: ClearNLP POSTagger > > Hi Richard, > Yes- the ClearNLP tools (POSTagger, Dependency Parser, SRL) in cTAKES > were retrained with additional data (MiPAQ/SHARP). > The Dependency Parser/SRL replaced the existing one because the old > ClearParser ones were no longer supported. > > The ClearPOSTagger wasn't previously available in cTAKES, but we can > certainly make it an optional one in case some folks may want to use it. I'll > leave the default one (OpenNLP) as-is for the time being until we get some > more users/tests/benchmarks/feedback... > > --Pei > > > -----Original Message----- > > From: Richard Eckart de Castilho [mailto:[email protected] > > darmstadt.de] > > Sent: Monday, April 08, 2013 1:43 PM > > To: <[email protected]> > > Subject: Re: ClearNLP POSTagger > > > > Hi, > > > > did you train new models for the ClearNLP/OpenNLP tools? (Maybe I knew > > if I had followed a past discussion on models more closely.) > > > > Cheers, > > > > -- Richard > > > > Am 08.04.2013 um 18:15 schrieb "Chen, Pei" > > <[email protected]>: > > > > > Hi, > > > While working on the Dependency Parser/SRL labeler, we also have a > > POSTagger from ClearNLP. It is fairly simple and I have the code > > ready (also trained on the same data as the dep parser- MiPaq/SHARP) to > be checked-in. > > What does the folks think: > > > We can include both Analysis Engines in the ctakes-pos-tagger > > > project. But > > should we leave the current OpenNLP in the default pipeline or default > > to the latest? > > > > > > "The ClearNLP POS tagger shows more robust results on unknown words > > by generalizing lexical features. You can find the reference from this > > paper. > > > Fast and Robust Part-of-Speech Tagging Using Dynamic Model > > > Selection, > > Jinho D. Choi, Martha Palmer, Proceedings of the 50th Annual Meeting > > of the Association for Computational Linguistics (ACL'12), 363-367, Jeju, > Korea, 2012. > > [1] It also uses AdaGrad for machine learning, which is a more > > advanced learning algorithm than maximum entropy used by OpenNLP." > > > > > > [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf > > > > > > -- > > ------------------------------------------------------------------- > > Richard Eckart de Castilho > > Technical Lead > > Ubiquitous Knowledge Processing Lab (UKP-TUD) FB 20 Computer Science > > Department Technische Universität Darmstadt Hochschulstr. 10, D-64289 > > Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room > > S2/02/B117 [email protected] > > www.ukp.tu-darmstadt.de > > Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de > > -------------------------------------------------------------------
