[
https://issues.apache.org/jira/browse/UIMA-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13184082#comment-13184082
]
Peter Klügl commented on UIMA-2233:
-----------------------------------
Additional information about tags, especially by what kind of (html) tags a
token is surrounded, will be removed in this issue since the information is
stored in the inference annotation, but is only set in the seed lexer. I will
create a new issue to improve html support again.
> Make the seeding configurable and independently of the rule inference
> ---------------------------------------------------------------------
>
> Key: UIMA-2233
> URL: https://issues.apache.org/jira/browse/UIMA-2233
> Project: UIMA
> Issue Type: New Feature
> Components: TextMarker
> Reporter: Peter Klügl
> Assignee: Peter Klügl
>
> The seeding needs to become more configurable and the user should be able to
> choose the seeder or select given annotation types for the initial inference
> annotations (TextMarkerBasic). Both cases need to be configurable in the
> analysis engine descriptor. One possible approach for a more configurable
> seeding is the usage of the rule-based ICU tokenizer that would replace the
> JFlex lexer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira