[ 
https://issues.apache.org/jira/browse/UIMA-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13442639#comment-13442639
 ] 

Rinat Gareyev commented on UIMA-2455:
-------------------------------------

Yep, it seems to be ok now. Should I resolve this issue?
                
> Make ordering of getNextAnnotations result configurable
> -------------------------------------------------------
>
>                 Key: UIMA-2455
>                 URL: https://issues.apache.org/jira/browse/UIMA-2455
>             Project: UIMA
>          Issue Type: New Feature
>          Components: TextMarker
>            Reporter: Rinat Gareyev
>            Assignee: Peter Klügl
>
> Example rule:
> A B C{NOT(PARTOF(D))->MARK(D,3)};
> Example text:
> aText bText cText cMoreText
> where following correspondence between annotations and tokens are held:
> A = aText
> B = bText
> C = cText
> C = cText cMoreText
> Rule results in the following:
> D = cText
> However I expect that:
> D = cText cMoreText
> The reason of actual behaviour is 
> org.apache.uima.textmarker.rule.AnnotationComparator#compare implementation. 
> It returns a shorter annotation before longer. That is why the sequence 
> 'aText bText cText' will be matched and sequence 'aText bText cText 
> cMoreText' will not because it will be considered later and will not pass NOT 
> PARTOF condition.
> I've revealed this after migration to the latest TextMarker sources (from ASF 
> repo). Before we used the one from Sourceforge.net. In the old (sourceforge) 
> version this problem did not arise because TextMarkerBasic could keep only 
> one annotation per Type as 'begin anchor'. Returning to the example this 
> means that 'cText' TextMarkerBasic held only one C annotation as begin anchor.
> In current (rev. 1371274) version TextMarkerBasic keeps a set of begin and 
> end anchors per Type. This is actually a good improvement.
> But I suggest to make ordering of anchored annotations returned by 
> TextMarkerRuleElement#getNextAnnotations(boolean, AnnotationFS, 
> TextMarkerStream) method more controllable.
> E.g., by adding some parameter for TextMarkerEngine or script which will 
> define AnnotationComparator#compare implementation.
> Also returning longer annotations before shorter ones seems to be more 
> compliant to the UIMA default indexing. See 
> http://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.cas.index.built_in_indexes

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to