Alexander N Thomas created UIMA-3075:
----------------------------------------

             Summary: Unambiguous non-strict subiterator may return annotations 
outside the given annotation's range
                 Key: UIMA-3075
                 URL: https://issues.apache.org/jira/browse/UIMA-3075
             Project: UIMA
          Issue Type: Bug
    Affects Versions: 2.4.0C
            Reporter: Alexander N Thomas
            Priority: Minor


REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get four 
token annotations

"aaa" 0-3
"bbb" 4-7
"ccc" 8-11
"ddd" 12-15

I then iterate over the token annotations while printing the covered text, 
begin and end, make an unambiguous non-strict subiterator, and iterate over the 
subiterations printing out their covered text, begin and end all indented.


                Iterator<Annotation> iter = 
jcas.getAnnotationIndex(Token.type).iterator();
                while (iter.hasNext()) {
                        Annotation a = iter.next();
                        System.out.println("\"" + a.getCoveredText() + "\"" + " 
[" + a.getBegin() + ", " + a.getEnd() + ")");
                        Iterator<Annotation> featIter = 
jcas.getAnnotationIndex().subiterator(a, false, false);
                        while (featIter.hasNext()) {
                                Annotation b = featIter.next();
                                System.out.println("\t\"" + b.getCoveredText() 
+ "\"" + " [" + b.getBegin() + ", " + b.getEnd() + ")");
                        }
                }

The output is
"aaa" [0, 3)
        "bbb" [4, 7)
"bbb" [4, 7)
        "ccc" [8, 11)
"ccc" [8, 11)
        "ddd" [12, 15)
"ddd" [12, 15)

I think this can be fixed by adding an extra check at Subiterator.java ln: 127
NOW
    while (it.isValid() && ((start > annot.getBegin()) || (strict && 
annot.getEnd() > end))) {
      it.moveToNext();
    }
POSSIBLE FIX
    while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <= 
end) || (strict && annot.getEnd() > end))) {
      it.moveToNext();
    }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to