[
https://issues.apache.org/jira/browse/UIMA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669386#comment-13669386
]
Marshall Schor commented on UIMA-2947:
--------------------------------------
When suggesting / making changes, please also consider "backwards
compatibility". I may be wrong (because I haven't really looked into it) but
it seems that if you "remove multiWordSeparator" as an allowed thing, then many
older, existing uses of the Dictionary Annotator may start to fail. Did I
misunderstand?
> Improve format of multi-word entries in dictionary files
> --------------------------------------------------------
>
> Key: UIMA-2947
> URL: https://issues.apache.org/jira/browse/UIMA-2947
> Project: UIMA
> Issue Type: Improvement
> Components: Sandbox-DictionaryAnnotator
> Environment: Linux
> Reporter: Armin Wegner
> Labels: XML,, dictionary
>
> Using a single character to separate tokens in a Dictionary Annotator's
> dictionary file is not XML like. It looks like a remnant from old
> comma-separated-value days. So remove multiWordSeparator from
> dictionaryMetaData and let an entry look like
> <entry><key><token>AOL</token><token>Mail</token></key></entry> or
> <entry><key><token>azbuz</token><token>.</token><token>com</token></key></entry>.
> By the way, what is <key> good for? Do we need it?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira