Re: Request for comments: Annotation interval relation specification

Richard Eckart de Castilho Mon, 09 Nov 2020 03:39:19 -0800

In the previous post, I said that the implementation change to reconcile 
"includeAnnotationsWithEndBeyondBounds" with the annotation predicate
matrix to not cause any tests to fail which were not expected to fail...


... it turns out, this was a premature observation. When I ran all tests
in uimaj-core in Eclipse as a batch, indeed no tests failed. However,
on Jenkins the org.apache.uima.cas.test.AnnotationIteratorTest.testIterator1()
failed and when I ran that in isolation in Eclipse, it also failed.

One of the reasons it fails is the change in the behavior of the Subiterator in
non-strict mode that was made to reconcile non-strict (aka 
"includeAnnotationsWithEndBeyondBounds") with the predicates. 

The test case assumes that an annotation Y that starts at the end position of
another annotation X is part of the iteration range:

UIMA <= 3.1.1
```
annotIndex: { [0-10], [10-20] }

annotIndex.subiterator([0-10], ambiguous, non-strict) = { [0-10], [10-20] }
```

However, if we want that

  annotIndex.subiterator(..., ambiguous, non-strict) 

should be equivalent to 

  annotIndex.stream().filter(x -> x != y && (coveredBy(x, y) || 
overlappingAtEnd(x, y)))

then [10-20] should not be in the result list because 

  coveredBy       ([0-10], [10-20]) is false
  overlappingAtEnd([0-10], [10-20]) is false

That is a bit of a dilemma.

On the one hand, I think the existing behavior of the non-strict subiterator is 
not good. It doesn't make sense. It clashes with the rationales that were 
discussed in conjunction with the annotation predicates.

On the other hand, it appears the behavior has been that way for ages and there 
might be code out there relying on this.

I tend towards making it so that the subiterator only exhibits consistent 
behavior with the predicates when it is used internally by .select() but to 
retain the old behavior when it is used through .subiterator()... that should 
at least not break old code. But then again it feels quite hacky to do it this 
way...

Any opinions?

Cheers,

-- Richard

Re: Request for comments: Annotation interval relation specification

Reply via email to