Mario Juric created UIMA-6137:
---------------------------------

             Summary: Type-based filtering in Ruta rules
                 Key: UIMA-6137
                 URL: https://issues.apache.org/jira/browse/UIMA-6137
             Project: UIMA
          Issue Type: New Feature
          Components: Ruta
            Reporter: Mario Juric


The visibility concept in Ruta is not type-based but type coverage-based, which 
means that filtered types will hide the are they cover to the Ruta rules, i.e. 
these areas become invisible to the rules.

We have a use case where we only want to hide the types from being considered 
in the rules, and not the covered text area where other types found in these 
areas should still be considered by the rules.

We use Ruta as part of the normalization process where we have different text 
areas marked with annotations associated with the tags in the original content 
(title, abstract/summary, body, COI, authors, citations etc.), and Ruta is part 
of the parsing process that produces this view. Using only the content 
annotations Ruta is then used to markup what areas to include in a new view for 
doing NLP. This approach gives us maximum traceability of the normalization 
process.

However, the different types of content annotations can sometimes interfere 
with the rules beyond our control, and our current solution leads to more 
awkward rules that are hard to verify, and which also leads to a less 
performant implementation. The problem would in our case better be solved if we 
were able to tell Ruta simply to ignore certain types from being considered, 
i.e. they are invisible to the Ruta rules. Preferably we want to be able to add 
and remove filtered types in the script similar to how it works with the 
coverage based type filter.
Please see also this mailing list thread where a toy example of the problem is 
discussed:
 
[https://lists.apache.org/thread.html/604417ac76ab85fc8d87eef12d4696b89d3257b7a53719518d9f5408@<user.uima.apache.org>|https://lists.apache.org/thread.html/604417ac76ab85fc8d87eef12d4696b89d3257b7a53719518d9f5408@%3Cuser.uima.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to