Peter Klügl created UIMA-2757:
---------------------------------
Summary: TextMarker: Add wildcard rule element
Key: UIMA-2757
URL: https://issues.apache.org/jira/browse/UIMA-2757
Project: UIMA
Issue Type: Bug
Components: TextMarker
Reporter: Peter Klügl
Assignee: Peter Klügl
Right now, something like a wildcard or an I-don't-care rule element can be
implemented with ANY*?. However, those rule elements actually investigate each
token until the next rule element is successfully matched, meaning they are
slow if there is some space in between.
A real wildcard, which just skips everything, would really be useful (and
faster). This can be implemented by not iterating over the visible inference
annotations, but actually finding a matchable position in the index and then
check whether it is visible. Since the next rule element can possibly quite
complex, it is maybe better to just match to the next annotation, and if that
one is invisible, then return a failed match. This behavior needs actually some
careful testing in different use cases.
First suggestion for the syntax (** for wild card):
CW **{-> MARK(Type)} PERIOD;
The "**" is maybe not the best solution since it looks quite like a quantifier
*?. Introducting an actual keyword can also be problematic since they might be
a type with the same name. Maybe something like
CW #{-> MARK(Type)} PERIOD;
is better.
This rule would create an annotation from the end of each capitalized word to
the begin of the next period, including the white spaces. However, those can be
removed with the TRIM action.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira