Alexander N Thomas created UIMA-3075:
----------------------------------------
Summary: Unambiguous non-strict subiterator may return annotations
outside the given annotation's range
Key: UIMA-3075
URL: https://issues.apache.org/jira/browse/UIMA-3075
Project: UIMA
Issue Type: Bug
Affects Versions: 2.4.0C
Reporter: Alexander N Thomas
Priority: Minor
REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get four
token annotations
"aaa" 0-3
"bbb" 4-7
"ccc" 8-11
"ddd" 12-15
I then iterate over the token annotations while printing the covered text,
begin and end, make an unambiguous non-strict subiterator, and iterate over the
subiterations printing out their covered text, begin and end all indented.
Iterator<Annotation> iter =
jcas.getAnnotationIndex(Token.type).iterator();
while (iter.hasNext()) {
Annotation a = iter.next();
System.out.println("\"" + a.getCoveredText() + "\"" + "
[" + a.getBegin() + ", " + a.getEnd() + ")");
Iterator<Annotation> featIter =
jcas.getAnnotationIndex().subiterator(a, false, false);
while (featIter.hasNext()) {
Annotation b = featIter.next();
System.out.println("\t\"" + b.getCoveredText()
+ "\"" + " [" + b.getBegin() + ", " + b.getEnd() + ")");
}
}
The output is
"aaa" [0, 3)
"bbb" [4, 7)
"bbb" [4, 7)
"ccc" [8, 11)
"ccc" [8, 11)
"ddd" [12, 15)
"ddd" [12, 15)
I think this can be fixed by adding an extra check at Subiterator.java ln: 127
NOW
while (it.isValid() && ((start > annot.getBegin()) || (strict &&
annot.getEnd() > end))) {
it.moveToNext();
}
POSSIBLE FIX
while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <=
end) || (strict && annot.getEnd() > end))) {
it.moveToNext();
}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira