This is all from memory: In the default pipeline Noun phrases are turned into lookupWindows.
The LookupDesc*.xml file being used describes what type annotation is used for a lookup window. Sentence is (or at least was at one time) too big to be reasonable both for performance purposes and because it would cause words from unrelated parts of a sentence to be considered together too often. LookupWindow annotations were designed to be the input to the dictionary lookup component. The default pipeline creates LookupWindow annotations from noun phrases (including NP PP NP patterns) so that the verb phrases etc are not searched. I believe that the overlap annotator is used in order to keep things modular and not be dependent on previous components - to not assume LookupWindows are non-overlapping. Overlapping ones get merged into a single LookupWindow so that when iterating through all LookupWindow annotations, cTAKES won't create multiple annotations for the same mention. -- James > -----Original Message----- > From: [email protected] > [mailto:ctakes-dev-return-847- > [email protected]] On Behalf Of Coarr, Matt > Sent: Wednesday, November 14, 2012 4:20 PM > To: [email protected] > Subject: what do they do? lookup window & overlap annotators > > Does anyone have a quick description of what the lookup window annotator > and the overlap annotator do? > > I've been looking at the descriptor for LookupWindowAnnotator.xml and > its two subcomponents (which use classes OverlapAnnotator and > CopyAnnotator). > > I'm trying to get a better grasp for what they do. > > Thanks! > Matt
