[
https://issues.apache.org/jira/browse/HIVE-12189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967713#comment-14967713
]
Chao Sun commented on HIVE-12189:
---------------------------------
Sorry, didn't see this. I'll take a look at the patch today.
> The list in pushdownPreds of ppd.ExprWalkerInfo should not be allowed to grow
> very large
> ----------------------------------------------------------------------------------------
>
> Key: HIVE-12189
> URL: https://issues.apache.org/jira/browse/HIVE-12189
> Project: Hive
> Issue Type: Bug
> Components: Logical Optimizer
> Affects Versions: 1.1.0, 2.0.0
> Reporter: Yongzhi Chen
> Assignee: Yongzhi Chen
> Attachments: HIVE-12189.1.patch
>
>
> Some queries are very slow in compile time, for example following query
> {noformat}
> select * from tt1 nf
> join tt2 a1 on (nf.col1 = a1.col1 and nf.hdp_databaseid = a1.hdp_databaseid)
> join tt3 a2 on (a2.col2 = a1.col2 and a2.col3 = nf.col3 and
> a2.hdp_databaseid = nf.hdp_databaseid)
> join tt4 a3 on (a3.col4 = a2.col4 and a3.col3 = a2.col3)
> join tt5 a4 on (a4.col4 = a2.col4 and a4.col5 = a2.col5 and a4.col3 =
> a2.col3 and a4.hdp_databaseid = nf.hdp_databaseid)
> join tt6 a5 on (a5.col3 = a2.col3 and a5.col2 = a2.col2 and
> a5.hdp_databaseid = nf.hdp_databaseid)
> JOIN tt7 a6 ON (a2.col3 = a6.col3 and a2.col2 = a6.col2 and a6.hdp_databaseid
> = nf.hdp_databaseid)
> JOIN tt8 a7 ON (a2.col3 = a7.col3 and a2.col2 = a7.col2 and a7.hdp_databaseid
> = nf.hdp_databaseid)
> where nf.hdp_databaseid = 102 limit 10;
> {noformat}
> takes around 120 seconds to compile in hive 1.1 when
> hive.mapred.mode=strict;
> hive.optimize.ppd=true;
> and hive is not in test mode.
> All the above tables are tables with one column as partition. But all the
> tables are empty table. If the tables are not empty, it is reported that the
> compile so slow that it looks like hive is hanging.
> In hive 2.0, the compile is much faster, explain takes 6.6 seconds. But it is
> still a lot of time. One of the problem slows ppd down is that list in
> pushdownPreds can grow very large which makes extractPushdownPreds bad
> performance:
> {noformat}
> public static ExprWalkerInfo extractPushdownPreds(OpWalkerInfo opContext,
> Operator<? extends OperatorDesc> op, List<ExprNodeDesc> preds)
> {noformat}
> During run the query above, in the following break point preds has size of
> 12051, and most entry of the list is:
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102),
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102),
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102),
> GenericUDFOPEqual(Column[hdp_databaseid], Const int 102), ....
> Following code in extractPushdownPreds will clone all the nodes in preds and
> do the walk. Hive 2.0 is faster because HIVE-11652(and other jiras) makes
> startWalking much faster, but we still clone thousands of nodes with same
> expression. Should we store so many same predicates in the list or just one
> is good enough?
> {noformat}
> List<Node> startNodes = new ArrayList<Node>();
> List<ExprNodeDesc> clonedPreds = new ArrayList<ExprNodeDesc>();
> for (ExprNodeDesc node : preds) {
> ExprNodeDesc clone = node.clone();
> clonedPreds.add(clone);
> exprContext.getNewToOldExprMap().put(clone, node);
> }
> startNodes.addAll(clonedPreds);
> egw.startWalking(startNodes, null);
> {noformat}
> Should we change java/org/apache/hadoop/hive/ql/ppd/ExprWalkerInfo.java
> method
> public void addFinalCandidate(String alias, ExprNodeDesc expr)
> and
> public void addPushDowns(String alias, List<ExprNodeDesc> pushDowns)
> to only add expr which is not in the PushDown list for an alias?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)