[
https://issues.apache.org/jira/browse/UIMA-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726700#comment-16726700
]
Peter Klügl commented on UIMA-5680:
-----------------------------------
I thought this would be something different, but this is actually caused by the
problems with the whitespaces in the trie lookup. The problem can easily be
fixed by removing the space in the dictionary, e.g., by activating the analysis
engine parameter. Sorry, that I did not take a closer look before.
> Special characters in MARKFAST dictionaries mask entries
> --------------------------------------------------------
>
> Key: UIMA-5680
> URL: https://issues.apache.org/jira/browse/UIMA-5680
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Affects Versions: 2.6.1ruta
> Reporter: Hugues de Mazancourt
> Assignee: Peter Klügl
> Priority: Major
> Fix For: 2.7.0ruta
>
> Attachments: Slash.ruta, dict.txt, text.txt
>
>
> It seems that two entries in MARKFAST dictionary simply differing from a
> special character make MARKFAST ignore some entries :
> My script is:
> DECLARE AndOr;
> Document{->MARKFAST(AndOr, 'dict.txt', true)};
> My dict.txt contains
> and/or
> and or
> On the following text : "knowledge of java and/or php and or Groovy is a
> plus", only the second "and or" (without the slash) is marked. If I remove
> the "unslashed" entry from the dict.txt file, "and/or" is correctly marked.
> This also happens with other separators, such as "+", ".", etc. and even if
> two entries share the same prefix. For example, if you add "and/or php" to
> dict.txt, it won't be marked.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)