[ 
https://issues.apache.org/jira/browse/UIMA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13669386#comment-13669386
 ] 

Marshall Schor commented on UIMA-2947:
--------------------------------------

When suggesting / making changes, please also consider "backwards 
compatibility".  I may be wrong (because I haven't really looked into it) but 
it seems that if you "remove multiWordSeparator" as an allowed thing, then many 
older, existing uses of the Dictionary Annotator may start to fail.  Did I 
misunderstand?
                
> Improve format of multi-word entries in dictionary files
> --------------------------------------------------------
>
>                 Key: UIMA-2947
>                 URL: https://issues.apache.org/jira/browse/UIMA-2947
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-DictionaryAnnotator
>         Environment: Linux
>            Reporter: Armin Wegner
>              Labels: XML,, dictionary
>
> Using a single character to separate tokens in a Dictionary Annotator's 
> dictionary file is not XML like. It looks like a remnant from old 
> comma-separated-value days. So remove multiWordSeparator from 
> dictionaryMetaData and let an entry look like 
> <entry><key><token>AOL</token><token>Mail</token></key></entry> or 
> <entry><key><token>azbuz</token><token>.</token><token>com</token></key></entry>.
>  By the way, what is <key> good for? Do we need it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to