In the previous post, I said that the implementation change to reconcile
"includeAnnotationsWithEndBeyondBounds" with the annotation predicate
matrix to not cause any tests to fail which were not expected to fail...
... it turns out, this was a premature observation. When I ran all tests
in uimaj-core in Eclipse as a batch, indeed no tests failed. However,
on Jenkins the org.apache.uima.cas.test.AnnotationIteratorTest.testIterator1()
failed and when I ran that in isolation in Eclipse, it also failed.
One of the reasons it fails is the change in the behavior of the Subiterator in
non-strict mode that was made to reconcile non-strict (aka
"includeAnnotationsWithEndBeyondBounds") with the predicates.
The test case assumes that an annotation Y that starts at the end position of
another annotation X is part of the iteration range:
UIMA <= 3.1.1
```
annotIndex: { [0-10], [10-20] }
annotIndex.subiterator([0-10], ambiguous, non-strict) = { [0-10], [10-20] }
```
However, if we want that
annotIndex.subiterator(..., ambiguous, non-strict)
should be equivalent to
annotIndex.stream().filter(x -> x != y && (coveredBy(x, y) ||
overlappingAtEnd(x, y)))
then [10-20] should not be in the result list because
coveredBy ([0-10], [10-20]) is false
overlappingAtEnd([0-10], [10-20]) is false
That is a bit of a dilemma.
On the one hand, I think the existing behavior of the non-strict subiterator is
not good. It doesn't make sense. It clashes with the rationales that were
discussed in conjunction with the annotation predicates.
On the other hand, it appears the behavior has been that way for ages and there
might be code out there relying on this.
I tend towards making it so that the subiterator only exhibits consistent
behavior with the predicates when it is used internally by .select() but to
retain the old behavior when it is used through .subiterator()... that should
at least not break old code. But then again it feels quite hacky to do it this
way...
Any opinions?
Cheers,
-- Richard