[
https://issues.apache.org/jira/browse/UIMA-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16449701#comment-16449701
]
Peter Klügl commented on UIMA-5757:
-----------------------------------
Using the default filtering setting, annotations that start or end with MARKUP
are not visible. There is an exception for the Document/DocumentAnnotation
annotation which is always visible and can be matched respectively. However,
all other annotations on the same offsets follow the common filtering rules.
Thus, in order to match on annotations of specific types that cover the
complete sofa string, you need to retain all filtered types, e.g., MARKUP and
maybe BREAK/ES in your example.
> Unable to extract features when annotation ends with HTML tag
> -------------------------------------------------------------
>
> Key: UIMA-5757
> URL: https://issues.apache.org/jira/browse/UIMA-5757
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Affects Versions: 2.6.1ruta
> Environment: RUTA 2.6.1, Windows 10, Eclipse Mars, JDK 1.8.0_144
> Reporter: Miguel Alvarez
> Priority: Minor
>
> If there is an annotation that covers the whole sofa string, and the sofa
> string ends with an HTML tag, it seems like RUTA isn't able to extract the
> features for that annotation. For instance, lets suppose this document
> (represented as XMI):
>
> {code:java}
> // XMI document
> <?xml version="1.0" encoding="UTF-8"?>
> <xmi:XMI xmlns:xmi="http://www.omg.org/XMI"
> xmlns:cas="http:///uima/cas.ecore" xmlns:tcas="http:///uima/tcas.ecore"
> xmlns:types="http:///com/acme/uima/types.ecore" xmi:version="2.0">
> <cas:NULL xmi:id="0"/>
> <tcas:DocumentAnnotation xmi:id="8" sofa="1" begin="0" end="12"
> language="es"/>
> <types:MyDocument xmi:id="14" sofa="1" begin="0" end="12"
> documentId="test_docsize_39d5541c-5e7f-391c-95af-c82ce6306644"/>
> <cas:Sofa xmi:id="1" sofaNum="1" sofaID="_InitialView" mimeType="text"
> sofaString="ABCDEFGHIJ<p>"/>
> <cas:View sofa="1" members="8 14"/>
> </xmi:XMI>
> {code}
> And the following RUTA script:
>
>
> {code:java}
> // RUTA script
> STRING documentId = "Unknown";
> com.acme.uima.types.MyDocument{-> GETFEATURE("documentId", documentId)};
> LOG("Starting to process document: " + documentId);
> {code}
> The LOG action will output Unknown. But as soon as the string doesn't end
> with an HTML tag, it works fine.
>
> Any ideas what could be going on?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)