[
https://issues.apache.org/jira/browse/UIMA-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648113#comment-13648113
]
Marshall Schor commented on UIMA-2434:
--------------------------------------
After playing around more with this, I think I found a bug, and after fixing
it, the iterators can be be made to work in the presence of removeAll... and
still do the "efficient" detail of resetting index structures that may have
grown beyond their initial size. Working on a fix...
UIMA partially follows Java's "fail fast" iterator design, but doesn't
implement a remove-within-iterator operator (this perhaps could be done, but
this part of the implementation is very difficult to get 100% right, due to the
complexities).
> Feature structure removal from sorted index is very slow
> --------------------------------------------------------
>
> Key: UIMA-2434
> URL: https://issues.apache.org/jira/browse/UIMA-2434
> Project: UIMA
> Issue Type: Improvement
> Components: Core Java Framework
> Affects Versions: 2.3.1SDK
> Reporter: Mikhail Sogrin
> Assignee: Marshall Schor
> Fix For: 2.4.1SDK
>
>
> Removal of feature structures from sorted indexes (e.g. default index) is
> very slow. FSIntArrayIndex.remove() method performs two operations: linear
> search in the array until the given FS is found, followed by the shift of
> elements to the end of this array by one position to the left.
> If many annotations (millions and more) are being deleted at once, this
> operation gets very very slow - much slower than adding these annotations in
> the first place. It seems to require O(N^2) time to remove N annotations.
> One item is the linear search, which can be replaced by the binary search
> method, which is already implemented in the same class.
> Second, array copy can be done with Java built-in method instead of a custom
> loop.
> Ideally, a method for bulk removal of a collection of annotations would have
> been the most efficient, for example a method to remove all annotations of a
> given type.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira