Mario Juric created UIMA-6137:
---------------------------------
Summary: Type-based filtering in Ruta rules
Key: UIMA-6137
URL: https://issues.apache.org/jira/browse/UIMA-6137
Project: UIMA
Issue Type: New Feature
Components: Ruta
Reporter: Mario Juric
The visibility concept in Ruta is not type-based but type coverage-based, which
means that filtered types will hide the are they cover to the Ruta rules, i.e.
these areas become invisible to the rules.
We have a use case where we only want to hide the types from being considered
in the rules, and not the covered text area where other types found in these
areas should still be considered by the rules.
We use Ruta as part of the normalization process where we have different text
areas marked with annotations associated with the tags in the original content
(title, abstract/summary, body, COI, authors, citations etc.), and Ruta is part
of the parsing process that produces this view. Using only the content
annotations Ruta is then used to markup what areas to include in a new view for
doing NLP. This approach gives us maximum traceability of the normalization
process.
However, the different types of content annotations can sometimes interfere
with the rules beyond our control, and our current solution leads to more
awkward rules that are hard to verify, and which also leads to a less
performant implementation. The problem would in our case better be solved if we
were able to tell Ruta simply to ignore certain types from being considered,
i.e. they are invisible to the Ruta rules. Preferably we want to be able to add
and remove filtered types in the script similar to how it works with the
coverage based type filter.
Please see also this mailing list thread where a toy example of the problem is
discussed:
[https://lists.apache.org/thread.html/604417ac76ab85fc8d87eef12d4696b89d3257b7a53719518d9f5408@<user.uima.apache.org>|https://lists.apache.org/thread.html/604417ac76ab85fc8d87eef12d4696b89d3257b7a53719518d9f5408@%3Cuser.uima.apache.org%3E]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)