[ https://issues.apache.org/jira/browse/UIMA-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rob van Dalen updated UIMA-6001: -------------------------------- Affects Version/s: (was: 2.6.1ruta) 2.7.0ruta > Problem with matching items in MarkFast with whitespacers visible > ----------------------------------------------------------------- > > Key: UIMA-6001 > URL: https://issues.apache.org/jira/browse/UIMA-6001 > Project: UIMA > Issue Type: Bug > Components: Ruta > Affects Versions: 2.7.0ruta > Reporter: Rob van Dalen > Assignee: Peter Klügl > Priority: Major > > The change / fix in UIMA-4556 cause some problems when using a CSV file with > whitespaces. > When we have a dictionary with whitespaces between words and > >> Param PARAM_DICT_REMOVE_WS is TRUE: > When WS are visible in the token stream: > - words with spacers are not recognized (as expected). > When WS are NOT visible in the token stream: > - all items in the dictionary will be recognized > - all items will also be recognized if you add whitespaces between words. > For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match. > >> Param PARAM_DICT_REMOVE_WS is FALSE: > When WS are visible in the token stream: > - not all entries in the dictionary will be recognized > When WS are NOT visible in the token stream: > - also not all entries in the dictionary will be recognized > The problem that this cause is that the default value to ignore whitespaces > is always true (hardcoded). > {code:java} > private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true); > {code} > This is not correct because if you want to use whitespaces (if they are > important) that won't work. The matcher should use the same value as set in > the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS > method. > -I attached a patch to fix this issue.- > I'm working on a patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005)