+1 false ... I think I just wonder what side effects there might be to tweaking LVG
On Fri, Apr 18, 2014 at 11:56 AM, Finan, Sean < [email protected]> wrote: > +1 false > > -----Original Message----- > From: Miller, Timothy [mailto:[email protected]] > Sent: Friday, April 18, 2014 2:54 PM > To: [email protected] > Subject: Re: lvg entries > > Thanks for tracking that down Andy. > > I am making a pass at UimaFit-izing the configuration parameters for all > the annotators in the default pipeline, before I create the static factory > methods like we recently discussed. Should I go ahead and change this to > make default behavior be false? > > Tim > > > On 04/18/2014 12:47 AM, andy mcmurry wrote: > > There is a lot of config handling, maybe PostLemmas is being set to > > true or > > configInit() is not setting up the NLM wrapper incorrectly. > > > > ctakes-lvg *README* > > Note: as distributed, PostLemmas is set to false. This is done to > > reduce the size of the CAS. > > Set PostLemmas to true to have org.apache.ctakes.typesystem.type.Lemma > > annotations added to the CAS. > > > > *LvgAnnotator.xml * > > PostLemmas = True > > > > *LvgAnnotator.java* > > if (postLemmas) { > > lvgResource.getLvgLex() > > } > > > > > > > > > > > > > > > > On Thu, Apr 17, 2014 at 3:23 PM, Masanz, James J. <[email protected] > >wrote: > > > >> The normalizedForm field is filled in. It is used by dictionary lookup. > >> > >> So, for example, if the dictionary would contain "lymph node" but not > >> "lymph nodes", a document with text of "lymph nodes" would match the > >> dictionary entry "lymph node" because "node", being the normalized > >> form of "nodes", would be used when searching dictionary entries (in > >> addition to searching dictionary entries for "nodes") > >> > >> -----Original Message----- > >> From: Miller, Timothy [mailto:[email protected]] > >> Sent: Thursday, April 17, 2014 4:33 PM > >> To: [email protected] > >> Subject: Re: lvg entries > >> > >> Quick follow-up since I was interested. The current dependency parser > >> does have the option to use ctakes lemmas or do its own lemmatizing, > >> but that doesn't use the lemma field, it uses the normalizedForm > >> field. I'm not sure if that field is actually ever filled in -- on my > >> example data it is always null. > >> > >> Tim > >> > >> On 04/17/2014 01:57 PM, Masanz, James J. wrote: > >>> Offhand I recall at least one of the dependency parsers used the > >>> Lemma > >> annotations at one point. > >>> Not sure if still does. > >>> > >>> There is an option for turning off the posting of the lemmas to the > cas. > >>> > >>> Hope that helps > >>> > >>> -----Original Message----- > >>> From: Miller, Timothy [mailto:[email protected]] > >>> Sent: Thursday, April 17, 2014 11:27 AM > >>> To: [email protected] > >>> Subject: lvg entries > >>> > >>> The LVG annotator creates an enormous number of "lemmas" for every > >>> WordToken in the CAS, and I'm wondering what the original purpose > >>> was? I think this is probably a minor bottleneck for speed but > >>> mostly a pretty big space hog (at least 50% of the space of xmi files > in my tests). > >>> > >>> As of right now I'm not sure if any downstream components are using > >>> these lemmas, and on a manual inspection the precision seems to be > >>> pretty abysmal (meaning most of them are nonsensical as lexical > >>> variants), so as I said, just wondering if we can revisit why cTAKES > >>> generates so many and whether that component can be optimized. > >>> > >>> Thanks > >>> Tim > >>> > >>> > >> > >
