[
https://issues.apache.org/jira/browse/UIMA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marshall Schor reopened UIMA-2434:
----------------------------------
Two issues: test case broken, and need to re-evaluate strategy for remove-all.
There is a trade-off to be made in the design. The current trade-off: improves
reliability by allowing existing iterators to function (if you do a
modification-resetting operation, like move-to-first) after a remove-all. But
it does this at the expense of not letting Java recover space from the indexes
- the indexes are reset-in-place, without freeing any storage. An alternative
impl would free the storage used by the index, but any existing iterators that
were iterating over that type would potentially not "see" the removal because
they're holding on to reference to the before-the-remove-all version of the
index.
UIMA's design has these kinds of trade-offs in other places. One way it's done
is to have two forms of the methods (e.g. in the CAS there are regular methods
and "low-level" - use at your own risk - methods).
So, perhaps we could do the same here, and have a 2nd variety of these methods,
e.g.
removeAllIncludingSubtypesReclaimingSpace
and adding a note that any iterator which might include FeatureStructures being
removed needs to be discarded (and recreated if needed).
Or perhaps this is over-design/over-thinking, and this variety isn't really of
sufficient interest? Opinions?
> Feature structure removal from sorted index is very slow
> --------------------------------------------------------
>
> Key: UIMA-2434
> URL: https://issues.apache.org/jira/browse/UIMA-2434
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 2.3.1SDK
> Reporter: Mikhail Sogrin
> Assignee: Marshall Schor
> Fix For: 2.4.1SDK
>
>
> Removal of feature structures from sorted indexes (e.g. default index) is
> very slow. FSIntArrayIndex.remove() method performs two operations: linear
> search in the array until the given FS is found, followed by the shift of
> elements to the end of this array by one position to the left.
> If many annotations (millions and more) are being deleted at once, this
> operation gets very very slow - much slower than adding these annotations in
> the first place. It seems to require O(N^2) time to remove N annotations.
> One item is the linear search, which can be replaced by the binary search
> method, which is already implemented in the same class.
> Second, array copy can be done with Java built-in method instead of a custom
> loop.
> Ideally, a method for bulk removal of a collection of annotations would have
> been the most efficient, for example a method to remove all annotations of a
> given type.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira