[
https://issues.apache.org/jira/browse/HIVE-11652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764924#comment-15764924
]
Dhiraj Kumar commented on HIVE-11652:
-------------------------------------
[~jcamachorodriguez] The patch causes a performance issue.
Example query.
{code}select a from (select 1 as a ) tbl where a in
(1,2,3,4,5,6,7,8,9,10);{code}
Method source
{code}
protected void walk(Node nd) throws SemanticException {
// Push the node in the stack
opStack.push(nd);
// While there are still nodes to dispatch...
while (!opStack.empty()) {
Node node = opStack.peek();
if (node.getChildren() == null ||
getDispatchedList().containsAll(node.getChildren())) {
// Dispatch current node
if (!getDispatchedList().contains(node)) {
dispatch(node, opStack);
opQueue.add(node);
}
opStack.pop();
continue;
}
// Add a single child and restart the loop
for (Node childNode : node.getChildren()) {
if (!getDispatchedList().contains(childNode)) {
opStack.push(childNode);
break;
}
}
} // end while
}
{code}
The walk method will push the root node onto stack (where clause in this case,
which has 12 child) and will call all its direct child at line 166. It will
process single child (in this example) and will again invoke
node.getChildren(). A total of 12 invocation of getChildren() will be made.
Now, if in clause has huge list, it will causes
1. As many invocation of getChildren() method as there are children. So if "in
clause" has 50K values, getChildren() will be invoked 50K times.
2. Huge number of nodes and their repeated invocation puts memory pressure in
ASTNode.getChildren(). Since it returns all the children in every case.
3. Since the thread has taken a lock initially before compilation started, it
blocks another compilation to make progress.
Depending on the query, it is order of magnitude slower.
> Avoid expensive call to removeAll in DefaultGraphWalker
> -------------------------------------------------------
>
> Key: HIVE-11652
> URL: https://issues.apache.org/jira/browse/HIVE-11652
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer, Physical Optimizer
> Affects Versions: 1.3.0, 2.0.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-11652.01.patch, HIVE-11652.02.patch,
> HIVE-11652.patch
>
>
> When the plan is too large, the removeAll call in DefaultGraphWalker (line
> 140) will take very long as it will have to go through the list looking for
> each of the nodes. We try to get rid of this call by rewriting the logic in
> the walker.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)