[ 
https://issues.apache.org/jira/browse/UIMA-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hugues de Mazancourt updated UIMA-5680:
---------------------------------------
    Description: 
It seems that two entries in MARKFAST dictionary simply differing from a 
special character make MARKFAST ignore some entries :
My script is:

DECLARE AndOr;
Document{->MARKFAST(AndOr, 'dict.txt', true)};

My dict.txt contains
and/or
and or

On the following text : "knowledge of java and/or php and or Groovy is a plus", 
only the second "and or" (without the slash) is marked. If I remove the 
"unslashed" entry from the dict.txt file, "and/or" is correctly marked.
This also happens with other separators, such as "+", ".", etc. and even if two 
entries share the same prefix. For example, if you add "and/or php" to 
dict.txt, it won't be marked.

  was:
It seems that two entries in MARKFAST dictionary simply differing from a 
special character make MARKFAST ignore some entries :
My script is:
{{DECLARE AndOr;
Document{->MARKFAST(AndOr, 'dict.txt', true)};
}}
My dict.txt contains
{{and/or
and or}}

On the following text : "knowledge of java and/or php and or Groovy is a plus", 
only the second "and or" (without the slash) is marked. If I remove the 
"unslashed" entry from the dict.txt file, "and/or" is correctly marked.
This also happens with other separators, such as "+", ".", etc. and even if two 
entries share the same prefix. For example, if you add "and/or php" to 
dict.txt, it won't be marked.


> Special characters in MARKFAST dictionaries mask entries
> --------------------------------------------------------
>
>                 Key: UIMA-5680
>                 URL: https://issues.apache.org/jira/browse/UIMA-5680
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>            Reporter: Hugues de Mazancourt
>         Attachments: Slash.ruta, dict.txt, text.txt
>
>
> It seems that two entries in MARKFAST dictionary simply differing from a 
> special character make MARKFAST ignore some entries :
> My script is:
> DECLARE AndOr;
> Document{->MARKFAST(AndOr, 'dict.txt', true)};
> My dict.txt contains
> and/or
> and or
> On the following text : "knowledge of java and/or php and or Groovy is a 
> plus", only the second "and or" (without the slash) is marked. If I remove 
> the "unslashed" entry from the dict.txt file, "and/or" is correctly marked.
> This also happens with other separators, such as "+", ".", etc. and even if 
> two entries share the same prefix. For example, if you add "and/or php" to 
> dict.txt, it won't be marked.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to