[
https://issues.apache.org/jira/browse/UIMA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Klügl resolved UIMA-2397.
-------------------------------
Resolution: Fixed
done
> TextMarker: Improve overall functionality in use cases with very large
> artifacts
> --------------------------------------------------------------------------------
>
> Key: UIMA-2397
> URL: https://issues.apache.org/jira/browse/UIMA-2397
> Project: UIMA
> Issue Type: Improvement
> Components: TextMarker
> Affects Versions: 2.0.0TextMarker
> Reporter: Peter Klügl
> Assignee: Peter Klügl
> Fix For: 2.0.1TextMarker
>
>
> TextMarker is not applicable in use cases with very large artifacts, e.g.,
> documents with 500k - 1M tokens.
> Adapt or exchange the rule language to allow the user to handle such texts:
> - reduce the memory profile of TextMarkerBasic inference annotations, make it
> configurable respectively.
> - add the concept of simple rules that match only on a single regular
> expression for adding annotations without inference annotations (related to
> UIMA-2331).
> - allow the user to skip seeding at the startup of the engine and to apply
> the seeders on certain annotations within rule inference.
> - introduce language concepts that enable the user to split documents into
> multiple CASs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira