[
https://issues.apache.org/jira/browse/UIMA-5752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16406089#comment-16406089
]
Peter Klügl commented on UIMA-5752:
-----------------------------------
Hi,
since your contributions are non-trivial, you may want to consider signing and
submitting an ICLA (or even also a CCLA?). See
[https://uima.apache.org/get-involved.html]
If you have an question concerning an ICLA or the problem of this ticket, let
me know. Btw, I created a ticket for adding an alternative dictionary lookup
that covers 95% of the use cases and causes fewer problems. Unfortunately, I
won't find the time before (end of) April for it.
Best,
Peter
> Problem with matching items in MarkTable with whitespacers visible
> ------------------------------------------------------------------
>
> Key: UIMA-5752
> URL: https://issues.apache.org/jira/browse/UIMA-5752
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Affects Versions: 2.6.1ruta
> Reporter: Jasper Huzen
> Assignee: Peter Klügl
> Priority: Major
>
> The change / fix in UIMA-4556 cause some problems when using a CSV file with
> whitespaces.
> When we have a dictionary with whitespaces between words and
> >> Param PARAM_DICT_REMOVE_WS is TRUE:
> When WS are visible in the token stream:
> - words with spacers are not recognized (as expected).
> When WS are NOT visible in the token stream:
> - all items in the dictionary will be recognized
> - all items will also be recognized if you add whitespaces between words.
> For example: IlikeRUTA, Ilike Ruta, I like Ruta all result in the same match.
> >> Param PARAM_DICT_REMOVE_WS is FALSE:
> When WS are visible in the token stream:
> - not all entries in the dictionary will be recognized
> When WS are NOT visible in the token stream:
> - also not all entries in the dictionary will be recognized
> The problem that this cause is that the default value to ignore whitespaces
> is always true (hardcoded).
> {code:java}
> private IBooleanExpression ignoreWS = new SimpleBooleanExpression(true);
> {code}
> This is not correct because if you want to use whitespaces (if they are
> important) that won't work. The matcher should use the same value as set in
> the PARAM_DICT_REMOVE_WS parameter or the value that is set via setIgnoreWS
> method.
> -I attached a patch to fix this issue.-
> I'm working on a patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)