+1 false -----Original Message----- From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] Sent: Friday, April 18, 2014 2:54 PM To: dev@ctakes.apache.org Subject: Re: lvg entries
Thanks for tracking that down Andy. I am making a pass at UimaFit-izing the configuration parameters for all the annotators in the default pipeline, before I create the static factory methods like we recently discussed. Should I go ahead and change this to make default behavior be false? Tim On 04/18/2014 12:47 AM, andy mcmurry wrote: > There is a lot of config handling, maybe PostLemmas is being set to > true or > configInit() is not setting up the NLM wrapper incorrectly. > > ctakes-lvg *README* > Note: as distributed, PostLemmas is set to false. This is done to > reduce the size of the CAS. > Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma > annotations added to the CAS. > > *LvgAnnotator.xml * > PostLemmas = True > > *LvgAnnotator.java* > if (postLemmas) { > lvgResource.getLvgLex() > } > > > > > > > > On Thu, Apr 17, 2014 at 3:23 PM, Masanz, James J. > <masanz.ja...@mayo.edu>wrote: > >> The normalizedForm field is filled in. It is used by dictionary lookup. >> >> So, for example, if the dictionary would contain "lymph node" but not >> "lymph nodes", a document with text of "lymph nodes" would match the >> dictionary entry "lymph node" because "node", being the normalized >> form of "nodes", would be used when searching dictionary entries (in >> addition to searching dictionary entries for "nodes") >> >> -----Original Message----- >> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] >> Sent: Thursday, April 17, 2014 4:33 PM >> To: dev@ctakes.apache.org >> Subject: Re: lvg entries >> >> Quick follow-up since I was interested. The current dependency parser >> does have the option to use ctakes lemmas or do its own lemmatizing, >> but that doesn't use the lemma field, it uses the normalizedForm >> field. I'm not sure if that field is actually ever filled in -- on my >> example data it is always null. >> >> Tim >> >> On 04/17/2014 01:57 PM, Masanz, James J. wrote: >>> Offhand I recall at least one of the dependency parsers used the >>> Lemma >> annotations at one point. >>> Not sure if still does. >>> >>> There is an option for turning off the posting of the lemmas to the cas. >>> >>> Hope that helps >>> >>> -----Original Message----- >>> From: Miller, Timothy [mailto:timothy.mil...@childrens.harvard.edu] >>> Sent: Thursday, April 17, 2014 11:27 AM >>> To: dev@ctakes.apache.org >>> Subject: lvg entries >>> >>> The LVG annotator creates an enormous number of "lemmas" for every >>> WordToken in the CAS, and I'm wondering what the original purpose >>> was? I think this is probably a minor bottleneck for speed but >>> mostly a pretty big space hog (at least 50% of the space of xmi files in my >>> tests). >>> >>> As of right now I'm not sure if any downstream components are using >>> these lemmas, and on a manual inspection the precision seems to be >>> pretty abysmal (meaning most of them are nonsensical as lexical >>> variants), so as I said, just wondering if we can revisit why cTAKES >>> generates so many and whether that component can be optimized. >>> >>> Thanks >>> Tim >>> >>> >>