[
https://issues.apache.org/jira/browse/UIMA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711830#comment-13711830
]
Marshall Schor commented on UIMA-3075:
--------------------------------------
I checked this in a few ways. The code in the 2.4.1 level seems to be
operating correctly.
This example, when run now, produces output just from the first System.out
line. This I think is correct. What's happening is the outer loop gets, in
turn, the "Token" annotations. The first one is the aaa token. The 2nd
iterator is to be a subiterator to print out all annotations starting within
the aaa token, "after" the aaa token itself. Well, there are none. So this is
now correct. I also tried making some other Token spans to act as the
container, and it worked ok.
Closing this.
> Unambiguous non-strict subiterator may return annotations outside the given
> annotation's range
> ----------------------------------------------------------------------------------------------
>
> Key: UIMA-3075
> URL: https://issues.apache.org/jira/browse/UIMA-3075
> Project: UIMA
> Issue Type: Bug
> Affects Versions: 2.4.0SDK
> Reporter: Alexander N Thomas
> Assignee: Richard Eckart de Castilho
> Priority: Minor
>
> REPRO: using a tokenizer that matches on "[^ ]" on "aaa bbb ccc ddd" I get
> four token annotations
> "aaa" 0-3
> "bbb" 4-7
> "ccc" 8-11
> "ddd" 12-15
> I then iterate over the token annotations while printing the covered text,
> begin and end, make an unambiguous non-strict subiterator, and iterate over
> the subiterations printing out their covered text, begin and end all indented.
> Iterator<Annotation> iter =
> jcas.getAnnotationIndex(Token.type).iterator();
> while (iter.hasNext()) {
> Annotation a = iter.next();
> System.out.println("\"" + a.getCoveredText() + "\"" + "
> [" + a.getBegin() + ", " + a.getEnd() + ")");
> Iterator<Annotation> featIter =
> jcas.getAnnotationIndex().subiterator(a, false, false);
> while (featIter.hasNext()) {
> Annotation b = featIter.next();
> System.out.println("\t\"" + b.getCoveredText()
> + "\"" + " [" + b.getBegin() + ", " + b.getEnd() + ")");
> }
> }
> The output is
> "aaa" [0, 3)
> "bbb" [4, 7)
> "bbb" [4, 7)
> "ccc" [8, 11)
> "ccc" [8, 11)
> "ddd" [12, 15)
> "ddd" [12, 15)
> I think this can be fixed by adding an extra check at Subiterator.java ln: 127
> NOW
> while (it.isValid() && ((start > annot.getBegin()) || (strict &&
> annot.getEnd() > end))) {
> it.moveToNext();
> }
> POSSIBLE FIX
> while (it.isValid() && ((start > annot.getBegin() && annot.getBegin() <=
> end) || (strict && annot.getEnd() > end))) {
> it.moveToNext();
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira