Peter Klügl created UIMA-2397:
---------------------------------

             Summary: TextMarker: Improve overall functionality in use cases 
with very large artifacts
                 Key: UIMA-2397
                 URL: https://issues.apache.org/jira/browse/UIMA-2397
             Project: UIMA
          Issue Type: Improvement
          Components: TextMarker
            Reporter: Peter Klügl
            Assignee: Peter Klügl


TextMarker is not applicable in use cases with very large artifacts, e.g., 
documents with 500k - 1M tokens.
Adapt or exchange the rule language to allow the user to handle such texts:
- reduce the memory profile of TextMarkerBasic inference annotations, make it 
configurable respectively.
- add the concept of simple rules that match only on a single regular 
expression for adding annotations without inference annotations (related to 
UIMA-2331).
- allow the user to skip seeding at the startup of the engine and to apply the 
seeders on certain annotations within rule inference.
- introduce language concepts that enable the user to split documents into 
multiple CASs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to