Marshall Schor created UIMA-4357:
------------------------------------
Summary: create auxiliary flattened version of index and its
subtypes, automatically managed
Key: UIMA-4357
URL: https://issues.apache.org/jira/browse/UIMA-4357
Project: UIMA
Issue Type: Improvement
Reporter: Marshall Schor
Priority: Minor
Fix For: 2.7.1SDK
UIMA indexes allow retrieving items from the CAS, trading off space (for
indexes) for time (speed of finding items in the CAS, speed of iterating). For
sorted indexes over a type with subtypes, if the index isn't being modified, it
is possible to do a one-time extraction in sorted order of the items and save
this in an array, and iterate much more rapidly over that. I've seen lots of
cases of UIMA flows where some annotators will create and index a type (and its
subtypes), and once that's been done, the indexes are not subsequently updated
for these types, but downstream annotators iterate over them. It seems that a
lazy creation for this kind of flattened index would work well in many cases.
It is important, I think, to continue to preserve the same kind of
ConcurrentModificationException detection. To make this additional index
space-time trade-off automatic and reasonable, make the additional index
reachable via a SoftReference, to allow the GC to reclaim the space if needed.
Delay the creation of a flattened version until there's evidence that it will
be unmodified for some time. To count things that motivate its creation, count
the number of times an iterator over an index is using the code
"heapifyUp/Down" that manages the ordering of the subiterators to preserve sort
order. A basic indicator may be the number of times that occurs, without an
intervening update to the indexes, relative to the size of the index.
The flattened array can save a bit more time by holding references to the Java
cover class (JCas or non-JCas) for this object.
Cas Reset needs to clear out these things.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)