Hi Richard, Yes- the ClearNLP tools (POSTagger, Dependency Parser, SRL) in cTAKES were retrained with additional data (MiPAQ/SHARP). The Dependency Parser/SRL replaced the existing one because the old ClearParser ones were no longer supported.
The ClearPOSTagger wasn't previously available in cTAKES, but we can certainly make it an optional one in case some folks may want to use it. I'll leave the default one (OpenNLP) as-is for the time being until we get some more users/tests/benchmarks/feedback... --Pei > -----Original Message----- > From: Richard Eckart de Castilho [mailto:[email protected] > darmstadt.de] > Sent: Monday, April 08, 2013 1:43 PM > To: <[email protected]> > Subject: Re: ClearNLP POSTagger > > Hi, > > did you train new models for the ClearNLP/OpenNLP tools? (Maybe I knew if > I had followed a past discussion on models more closely.) > > Cheers, > > -- Richard > > Am 08.04.2013 um 18:15 schrieb "Chen, Pei" > <[email protected]>: > > > Hi, > > While working on the Dependency Parser/SRL labeler, we also have a > POSTagger from ClearNLP. It is fairly simple and I have the code ready (also > trained on the same data as the dep parser- MiPaq/SHARP) to be checked-in. > What does the folks think: > > We can include both Analysis Engines in the ctakes-pos-tagger project. But > should we leave the current OpenNLP in the default pipeline or default to > the latest? > > > > "The ClearNLP POS tagger shows more robust results on unknown words > by generalizing lexical features. You can find the reference from this paper. > > Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection, > Jinho D. Choi, Martha Palmer, Proceedings of the 50th Annual Meeting of the > Association for Computational Linguistics (ACL'12), 363-367, Jeju, Korea, > 2012. > [1] It also uses AdaGrad for machine learning, which is a more advanced > learning algorithm than maximum entropy used by OpenNLP." > > > > [1] http://aclweb.org/anthology-new/P/P12/P12-2071.pdf > > > -- > ------------------------------------------------------------------- > Richard Eckart de Castilho > Technical Lead > Ubiquitous Knowledge Processing Lab (UKP-TUD) > FB 20 Computer Science Department > Technische Universität Darmstadt > Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, > fax -5455, room S2/02/B117 [email protected] > www.ukp.tu-darmstadt.de > Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de > -------------------------------------------------------------------
