Peter Klügl created UIMA-2397:
---------------------------------
Summary: TextMarker: Improve overall functionality in use cases
with very large artifacts
Key: UIMA-2397
URL: https://issues.apache.org/jira/browse/UIMA-2397
Project: UIMA
Issue Type: Improvement
Components: TextMarker
Reporter: Peter Klügl
Assignee: Peter Klügl
TextMarker is not applicable in use cases with very large artifacts, e.g.,
documents with 500k - 1M tokens.
Adapt or exchange the rule language to allow the user to handle such texts:
- reduce the memory profile of TextMarkerBasic inference annotations, make it
configurable respectively.
- add the concept of simple rules that match only on a single regular
expression for adding annotations without inference annotations (related to
UIMA-2331).
- allow the user to skip seeding at the startup of the engine and to apply the
seeders on certain annotations within rule inference.
- introduce language concepts that enable the user to split documents into
multiple CASs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira