Marshall Schor created UIMA-4357:
------------------------------------

             Summary: create auxiliary flattened version of index and its 
subtypes, automatically managed
                 Key: UIMA-4357
                 URL: https://issues.apache.org/jira/browse/UIMA-4357
             Project: UIMA
          Issue Type: Improvement
            Reporter: Marshall Schor
            Priority: Minor
             Fix For: 2.7.1SDK


UIMA indexes allow retrieving items from the CAS, trading off space (for 
indexes) for time (speed of finding items in the CAS, speed of iterating).  For 
sorted indexes over a type with subtypes, if the index isn't being modified, it 
is possible to do a one-time extraction in sorted order of the items and save 
this in an array, and iterate much more rapidly over that. I've seen lots of 
cases of UIMA flows where some annotators will create and index a type (and its 
subtypes), and once that's been done, the indexes are not subsequently updated 
for these types, but downstream annotators iterate over them.  It seems that a 
lazy creation for this kind of flattened index would work well in many cases.

It is important, I think, to continue to preserve the same kind of 
ConcurrentModificationException detection.  To make this additional index 
space-time trade-off automatic and reasonable, make the additional index 
reachable via a SoftReference, to allow the GC to reclaim the space if needed.  

Delay the creation of a flattened version until there's evidence that it will 
be unmodified for some time.  To count things that motivate its creation, count 
the number of times an iterator over an index is using the code 
"heapifyUp/Down" that manages the ordering of the subiterators to preserve sort 
order.  A basic indicator may be the number of times that occurs, without an 
intervening update to the indexes, relative to the size of the index.

The flattened array can save a bit more time by holding references to the Java 
cover class (JCas or non-JCas) for this object. 

Cas Reset needs to clear out these things.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to