[
https://issues.apache.org/jira/browse/UIMA-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Klügl reassigned UIMA-2455:
---------------------------------
Assignee: Peter Klügl
> Make ordering of getNextAnnotations result configurable
> -------------------------------------------------------
>
> Key: UIMA-2455
> URL: https://issues.apache.org/jira/browse/UIMA-2455
> Project: UIMA
> Issue Type: New Feature
> Components: TextMarker
> Reporter: Rinat Gareyev
> Assignee: Peter Klügl
>
> Example rule:
> A B C{NOT(PARTOF(D))->MARK(D,3)};
> Example text:
> aText bText cText cMoreText
> where following correspondence between annotations and tokens are held:
> A = aText
> B = bText
> C = cText
> C = cText cMoreText
> Rule results in the following:
> D = cText
> However I expect that:
> D = cText cMoreText
> The reason of actual behaviour is
> org.apache.uima.textmarker.rule.AnnotationComparator#compare implementation.
> It returns a shorter annotation before longer. That is why the sequence
> 'aText bText cText' will be matched and sequence 'aText bText cText
> cMoreText' will not because it will be considered later and will not pass NOT
> PARTOF condition.
> I've revealed this after migration to the latest TextMarker sources (from ASF
> repo). Before we used the one from Sourceforge.net. In the old (sourceforge)
> version this problem did not arise because TextMarkerBasic could keep only
> one annotation per Type as 'begin anchor'. Returning to the example this
> means that 'cText' TextMarkerBasic held only one C annotation as begin anchor.
> In current (rev. 1371274) version TextMarkerBasic keeps a set of begin and
> end anchors per Type. This is actually a good improvement.
> But I suggest to make ordering of anchored annotations returned by
> TextMarkerRuleElement#getNextAnnotations(boolean, AnnotationFS,
> TextMarkerStream) method more controllable.
> E.g., by adding some parameter for TextMarkerEngine or script which will
> define AnnotationComparator#compare implementation.
> Also returning longer annotations before shorter ones seems to be more
> compliant to the UIMA default indexing. See
> http://uima.apache.org/d/uimaj-2.4.0/references.html#ugr.ref.cas.index.built_in_indexes
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira