[
https://issues.apache.org/jira/browse/UIMA-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jasper Huzen updated UIMA-5775:
-------------------------------
Description:
Hi,
We encounter a performance issue (or maybe infinitive loop) when we use the
MARKTABLE action, with case insenstive valuelists.
The call in our script is:
{code:java}
ADDRETAINTYPE(WS);
MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0,
"lawIdentifier" = 2);{code}
Using the following input fragment will result in a timeout exception after 1
minute.
{code:java}
Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame,
concurrerende en continu geleverde energie voor Europa {SEC(2006)317}{code}
That complete name is a Dutch lawname and also be an entry of the
_nl_law_names.csv_ file.
When we try to match it and we have the ignoreCase flag to false, it is no
problem and fast.. If we toggle that flag to true (case is ignored), the
matching is really slow or even hanging in an infinitive loop.
I debugged the code and pinpoint me to the _TreeWordList_ class. The recursive
method _recursiveContains_ have a potential bug.
I think that the problem is when the item have a special character, that it is
the same character in upper and lowercase. The recursive method will then
look/fork twice on the same tree item.
I made a fix that check if the uppercase is the same character as the
lowercase, and in that case it only do the recursive call once. That solved the
(performance) issue but I'm not sure if this is really the main problem and the
current fix is the best fix for this.
was:
Hi,
We encounter a performance issue (or maybe infinitive loop) when we use the
MARKTABLE action, with case insenstive valuelists.
The call in our script is:
{code:java}
ADDRETAINTYPE(WS);
MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0,
"lawIdentifier" = 2);{code}
Using the following input fragment will result in a timeout exception after 1
minute.
{code:java}
Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame,
concurrerende en continu geleverde energie voor Europa {SEC(2006)317}{code}
That complete name is a Dutch lawname and also be an entry of the
_nl_law_names.csv_ file.
When we try to match it and we have the ignoreCase flag to false, it is no
problem and fast.. If we toggle that flag to true (case is ignored), the
matching is really slow or even hanging in an infinitive loop.
I debugged the code and pinpoint me to the _TreeWordList_ class. The recursive
method _recursiveContains_ have a potential bug.
I think that the problem is when the item have a special character, that it is
the same character in upper and lowercase. The recursive method will then
look/fork twice on the same tree item.
I made a fix that check if the uppercase is the same character as the
lowercase, and in that case it only do the recursive call once. That solved the
performance issue but I'm not sure if this is really the main problem and the
current fix is the best fix for this.
> Performance problem MARKTABLE when matching case insensitive
> ------------------------------------------------------------
>
> Key: UIMA-5775
> URL: https://issues.apache.org/jira/browse/UIMA-5775
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Affects Versions: 2.6.1ruta
> Reporter: Jasper Huzen
> Priority: Major
>
> Hi,
> We encounter a performance issue (or maybe infinitive loop) when we use the
> MARKTABLE action, with case insenstive valuelists.
> The call in our script is:
> {code:java}
> ADDRETAINTYPE(WS);
> MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0,
> "lawIdentifier" = 2);{code}
> Using the following input fragment will result in a timeout exception after 1
> minute.
> {code:java}
> Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame,
> concurrerende en continu geleverde energie voor Europa {SEC(2006)317}{code}
> That complete name is a Dutch lawname and also be an entry of the
> _nl_law_names.csv_ file.
> When we try to match it and we have the ignoreCase flag to false, it is no
> problem and fast.. If we toggle that flag to true (case is ignored), the
> matching is really slow or even hanging in an infinitive loop.
> I debugged the code and pinpoint me to the _TreeWordList_ class. The
> recursive method _recursiveContains_ have a potential bug.
> I think that the problem is when the item have a special character, that it
> is the same character in upper and lowercase. The recursive method will then
> look/fork twice on the same tree item.
> I made a fix that check if the uppercase is the same character as the
> lowercase, and in that case it only do the recursive call once. That solved
> the (performance) issue but I'm not sure if this is really the main problem
> and the current fix is the best fix for this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)