Jasper Huzen created UIMA-5775:
----------------------------------

             Summary: Performance problem MARKTABLE when matching case 
insensitive
                 Key: UIMA-5775
                 URL: https://issues.apache.org/jira/browse/UIMA-5775
             Project: UIMA
          Issue Type: Bug
          Components: Ruta
    Affects Versions: 2.6.1ruta
            Reporter: Jasper Huzen


Hi,

We encounter a performance issue (or maybe infinitive loop) when we use the 
MARKTABLE action, with case insenstive valuelists.

The call in our script is:
{code:java}
ADDRETAINTYPE(WS);
MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0, 
"lawIdentifier" = 2);{code}

Using the following input fragment will result in a timeout exception after 1 
minute.
{code:java}
Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame, 
concurrerende en continu geleverde energie voor Europa {SEC(2006)317}{code}
That complete name is a Dutch lawname and also be an entry of the 
_nl_law_names.csv_ file. 

When we try to match it and we have the ignoreCase flag to false, it is no 
problem and fast.. If we toggle that flag to true (case is ignored), the 
matching is really slow or even hanging in an infinitive loop.

I debugged the code and pinpoint me to the _TreeWordList_ class. The recursive 
method _recursiveContains_ have a potential bug. 

I think that the problem is when the item have a special character, that it is 
the same character in upper and lowercase. The recursive method will then 
look/fork twice on the same tree item.

I made a fix that check if the uppercase is the same character as the 
lowercase, and in that case it only do the recursive call once. That solved the 
performance issue but I'm not sure if this is really the main problem and the 
current fix is the best fix for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to