I'll take a look at the patch. Also be aware of https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of enhancing performance -- if willing to assume annotations (BaseTokens currently) are sorted. Currently it's always BaseToken and always sorted, just not sure if we want to code to that assumption.
________________________________________ From: ctakes-dev-return-1137-Masanz.James=mayo....@incubator.apache.org [ctakes-dev-return-1137-Masanz.James=mayo....@incubator.apache.org] on behalf of Tim Miller [[email protected]] Sent: Monday, February 04, 2013 3:43 PM To: [email protected] Subject: assistance with dictionary lookup issue Pei helped me track down an issue with performance I'd noticed in the dictionary annotator, and I have filed the issue here: https://issues.apache.org/jira/browse/CTAKES-143 I implemented a quick and dirty proof of concept fix and noticed dramatic performance improvement. I attached the patch to the issue, but it involves changing an interface (currently does not try to fix other implementing classes so obviously not ready for primetime), so I wanted to solicit the list first in case anyone with better knowledge of that module has some better engineering ideas than what I came up with. Thanks, -- Tim Miller, PhD Postdoctoral Research Fellow Children's Hospital Informatics Program Children's Hospital Boston and Harvard Medical School 617-919-1223
