Re: assistance with dictionary lookup issue

Tim Miller Tue, 05 Feb 2013 07:11:27 -0800

Yeah, if you mean just change the loop to iterate over the list insteadof getting an iterator that makes sense. There is still some logic inthere to leave out punctuation tokens but I think you were implying thatto be in your mockup diff.


As for sorting, the AnnotationIndex defines a sort order for its iterators:
http://uima.apache.org/d/uimaj-2.4.0/apidocs/org/apache/uima/cas/text/AnnotationIndex.html

so we are safe assuming that anything extending Annotation will beiterated in sorted order. Does that answer the questions we had? Or wasI missing something subtle in that discussion?


Tim

On 02/05/2013 09:44 AM, Masanz, James J. wrote:

Looks good to me, with one question.

Instead of getting an iterator and then building a new list, can we just skip 
getting the iterator and use the list that selectCovered returns?

I will mock up a diff here of what I mean:
-       Iterator btaItr = org.uimafit.util.JCasUtil.selectCovered(jcas, 
BaseToken.class, covering).iterator();
-       while (btaItr.hasNext())
-               {
-                       BaseToken bta = (BaseToken) btaItr.next();
-                               ltList.add(lt);
-                       }
-               }

+       ltList = org.uimafit.util.JCasUtil.selectCovered(jcas, BaseToken.class, 
covering);
        
        return ltList;

I know you said it was quick and dirty at the moment - my 2 cents - unless 
someone comes up with a better engineered solution, I think we could add the 
new method (with a name like getLookupTokens) and leave the old one so we don't 
have to deprecate anything. And phase in the change to the various 
*LookupInitializerImpl classes if needed.

-- James

-----Original Message-----
From: ctakes-dev-return-1138-Masanz.James=mayo....@incubator.apache.org
[mailto:ctakes-dev-return-1138-Masanz.James=mayo....@incubator.apache.org]
On Behalf Of Masanz, James J.
Sent: Monday, February 04, 2013 4:01 PM
To: [email protected]
Subject: RE: assistance with dictionary lookup issue

I'll take a look at the patch. Also be aware of
https://issues.apache.org/jira/browse/CTAKES-31 which talks about a way of
enhancing performance  -- if willing to assume annotations (BaseTokens
currently) are sorted. Currently it's always BaseToken and always sorted,
just not sure if we want to code to that assumption.

________________________________________
From: ctakes-dev-return-1137-Masanz.James=mayo....@incubator.apache.org
[ctakes-dev-return-1137-Masanz.James=mayo....@incubator.apache.org] on
behalf of Tim Miller [[email protected]]
Sent: Monday, February 04, 2013 3:43 PM
To: [email protected]
Subject: assistance with dictionary lookup issue

Pei helped me track down an issue with performance I'd noticed in the
dictionary annotator, and I have filed the issue here:
https://issues.apache.org/jira/browse/CTAKES-143

I implemented a quick and dirty proof of concept fix and noticed dramatic
performance improvement.  I attached the patch to the issue, but it
involves changing an interface (currently does not try to fix other
implementing classes so obviously not ready for primetime), so I wanted to
solicit the list first in case anyone with better knowledge of that module
has some better engineering ideas than what I came up with.

Thanks,

--
Tim Miller, PhD
Postdoctoral Research Fellow
Children's Hospital Informatics Program
Children's Hospital Boston and Harvard Medical School
617-919-1223

Re: assistance with dictionary lookup issue

Reply via email to