[
https://issues.apache.org/jira/browse/HIVE-25758?focusedWorklogId=761754&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-761754
]
ASF GitHub Bot logged work on HIVE-25758:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 25/Apr/22 12:30
Start Date: 25/Apr/22 12:30
Worklog Time Spent: 10m
Work Description: asolimando commented on code in PR #2966:
URL: https://github.com/apache/hive/pull/2966#discussion_r857573180
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java:
##########
@@ -143,28 +138,82 @@ private ImmutableList<RexNode>
getValidPreds(RelOptCluster cluster, RelNode chil
}
}
- // We need to filter i) those that have been pushed already as stored in
the join,
- // and ii) those that were already in the subtree rooted at child
- ImmutableList<RexNode> toPush =
HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
- child, valids);
- return toPush;
+ // We need to filter:
+ // i) those that have been pushed already as stored in the join,
+ // ii) those that were already in the subtree rooted at child.
+ List<RexNode> toPush =
HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude, child, valids);
+
+ // If we run the rule in conservative mode, we also filter:
+ // iii) predicates that are not safe for transitive inference.
+ //
+ // There is no formal definition of safety for predicate inference, only
an empirical one.
+ // An unsafe predicate in this context is one that when pushed across join
operands, can lead
+ // to redundant predicates that cannot be simplified (by means of
predicates merging with other existing ones).
+ // This situation can lead to an OOM for cases where lack of
simplification allows inferring new predicates
+ // (from LHS to RHS and vice-versa) recursively, predicates which are
redundant, but that RexSimplify cannot handle.
+ // This notion can be relaxed as soon as RexSimplify gets more powerful,
and it can handle such cases.
+ if (HiveConf.getBoolVar(conf,
HiveConf.ConfVars.HIVE_JOIN_PUSH_TRANSITIVE_PREDICATES_CONSERVATIVE)) {
+ toPush = toPush.stream()
+ .filter(unsafeOperatorsFinder::isSafe)
+ .collect(Collectors.toList());
+ }
+
+ return ImmutableList.copyOf(toPush);
}
- private RexNode getTypeSafePred(RelOptCluster cluster, RexNode rex,
RelDataType rType) {
- RexNode typeSafeRex = rex;
- if ((typeSafeRex instanceof RexCall) &&
HiveCalciteUtil.isComparisonOp((RexCall) typeSafeRex)) {
- RexBuilder rb = cluster.getRexBuilder();
- List<RexNode> fixedPredElems = new ArrayList<RexNode>();
- RelDataType commonType = cluster.getTypeFactory().leastRestrictive(
- RexUtil.types(((RexCall) rex).getOperands()));
- for (RexNode rn : ((RexCall) rex).getOperands()) {
- fixedPredElems.add(rb.ensureType(commonType, rn, true));
- }
+ //~ Inner Classes ----------------------------------------------------------
+
+ /**
+ * Finds unsafe operators in an expression (at any level of nesting).
+ * At the moment, the only unsafe operator is OR.
+ *
+ * Example 1: OR(=($0, $1), IS NOT NULL($2))):INTEGER (OR in the top-level
expression)
+ * Example 2: NOT(AND(=($0, $1), IS NOT NULL($2))
+ * this is equivalent to OR((<>($0, $1), IS NULL($2))
+ * Example 3: AND(OR(=($0, $1), IS NOT NULL($2)))) (OR in inner expression)
+ */
+ private static class UnsafeOperatorsFinder extends RexVisitorImpl<Void> {
Review Comment:
Agreed, it's better to start with what we have right now, we can always make
it more generic if needed later on.
Issue Time Tracking
-------------------
Worklog Id: (was: 761754)
Time Spent: 2h 20m (was: 2h 10m)
> OOM due to recursive application of CBO rules
> ---------------------------------------------
>
> Key: HIVE-25758
> URL: https://issues.apache.org/jira/browse/HIVE-25758
> Project: Hive
> Issue Type: Bug
> Components: CBO, Query Planning
> Affects Versions: 4.0.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Time Spent: 2h 20m
> Remaining Estimate: 0h
>
>
> Reproducing query is as follows:
> {code:java}
> create table test1 (act_nbr string);
> create table test2 (month int);
> create table test3 (mth int, con_usd double);
> EXPLAIN
> SELECT c.month,
> d.con_usd
> FROM
> (SELECT
> cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(),
> 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
> FROM test1
> UNION ALL
> SELECT month
> FROM test2
> WHERE month = 202110) c
> JOIN test3 d ON c.month = d.mth; {code}
>
> Different plans are generated during the first CBO steps, last being:
> {noformat}
> 2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main]
> parse.CalcitePlanner: Plan after decorre
> lation:
> HiveProject(month=[$0], con_usd=[$2])
> HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none],
> cost=[not available])
> HiveProject(month=[$0])
> HiveUnion(all=[true])
>
> HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
> _UTF-16LE'yyyy-MM-d
> d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7),
> _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-
> 16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER])
> HiveTableScan(table=[[default, test1]], table:alias=[test1])
> HiveProject(month=[$0])
> HiveFilter(condition=[=($0, CAST(202110):INTEGER)])
> HiveTableScan(table=[[default, test2]], table:alias=[test2])
> HiveTableScan(table=[[default, test3]], table:alias=[d]){noformat}
>
> Then, the HEP planner will keep expanding the filter expression with
> redundant expressions, such as the following, where the identical CAST
> expression is present multiple times:
>
> {noformat}
> rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
> _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1,
> 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE",
> _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER,
> CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
> _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1,
> 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE",
> _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER,
> 202110)){noformat}
>
> The problem seems to come from a bad interaction of at least
> _HiveFilterProjectTransposeRule_ and
> {_}HiveJoinPushTransitivePredicatesRule{_}, possibly more.
> Most probably then UNION part can be removed and the reproducer be simplified
> even further.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)