[ 
https://issues.apache.org/jira/browse/UIMA-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Klügl resolved UIMA-5775.
-------------------------------
    Resolution: Fixed

> Performance problem MARKTABLE when matching case insensitive
> ------------------------------------------------------------
>
>                 Key: UIMA-5775
>                 URL: https://issues.apache.org/jira/browse/UIMA-5775
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>            Reporter: Jasper Huzen
>            Assignee: Peter Klügl
>            Priority: Major
>             Fix For: 2.6.2ruta
>
>         Attachments: UIMA-5775.patch
>
>
> Hi,
> We encounter a performance issue (or maybe infinitive loop) when we use the 
> MARKTABLE action, with case insenstive valuelists.
> The call in our script is:
> {code:java}
> ADDRETAINTYPE(WS);
> MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0, 
> "lawIdentifier" = 2);{code}
> Using the following input fragment will result in a timeout exception after 1 
> minute.
> {code:java}
> Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame, 
> concurrerende en continu geleverde energie voor Europa {SEC(2006)317}{code}
> That complete name is a Dutch lawname and also be an entry of the 
> _nl_law_names.csv_ file.
> When we try to match it and we have the ignoreCase flag to false, it is no 
> problem and fast.. If we toggle that flag to true (case is ignored), the 
> matching is really slow or even hanging in an infinitive loop.
> I debugged the code and pinpoint me to the _TreeWordList_ class. The 
> recursive method _recursiveContains_ have a potential bug. 
> I think that the problem is when the item have a special character, that it 
> is the same character in upper and lowercase. The recursive method will then 
> look/fork twice on the same tree item.
> I made a fix that checks if the uppercase character is the same as the 
> lowercase character, and in that case it only do the recursive call once. 
> That solved the (performance) issue but I'm not sure if this is really the 
> main problem and the current fix is the best fix for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to