[
https://issues.apache.org/jira/browse/HIVE-25758?focusedWorklogId=720032&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-720032
]
ASF GitHub Bot logged work on HIVE-25758:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 03/Feb/22 11:15
Start Date: 03/Feb/22 11:15
Worklog Time Spent: 10m
Work Description: asolimando commented on a change in pull request #2966:
URL: https://github.com/apache/hive/pull/2966#discussion_r798459592
##########
File path:
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveJoinPushTransitivePredicatesRule.java
##########
@@ -144,27 +146,70 @@ public void onMatch(RelOptRuleCall call) {
}
// We need to filter i) those that have been pushed already as stored in
the join,
- // and ii) those that were already in the subtree rooted at child
- ImmutableList<RexNode> toPush =
HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude,
- child, valids);
- return toPush;
+ // ii) those that were already in the subtree rooted at child,
+ // iii) predicates that are not safe for transitive inference.
+ //
+ // There is no formal definition of safety for predicate inference, only
an empirical one.
+ // An unsafe predicate in this context is one that when pushed across join
operands, can lead
+ // to redundant predicates that cannot be simplified (by means of
predicates merging with other existing ones).
+ // This situation can lead to an OOM for cases where lack of
simplification allows inferring new predicates
+ // (from LHS to RHS) recursively, predicates which are redundant, but that
RexSimplify cannot handle.
+ // This notion can be relaxed as soon as RexSimplify gets more powerful,
and it can handle such cases.
+ List<RexNode> toPush =
HiveCalciteUtil.getPredsNotPushedAlready(predicatesToExclude, child,
valids).stream()
+ .filter(UNSAFE_OPERATORS_FINDER::isSafe)
+ .collect(Collectors.toList());
+
+ return ImmutableList.copyOf(toPush);
}
- private RexNode getTypeSafePred(RelOptCluster cluster, RexNode rex,
RelDataType rType) {
- RexNode typeSafeRex = rex;
- if ((typeSafeRex instanceof RexCall) &&
HiveCalciteUtil.isComparisonOp((RexCall) typeSafeRex)) {
- RexBuilder rb = cluster.getRexBuilder();
- List<RexNode> fixedPredElems = new ArrayList<RexNode>();
- RelDataType commonType = cluster.getTypeFactory().leastRestrictive(
- RexUtil.types(((RexCall) rex).getOperands()));
- for (RexNode rn : ((RexCall) rex).getOperands()) {
- fixedPredElems.add(rb.ensureType(commonType, rn, true));
- }
+ //~ Inner Classes ----------------------------------------------------------
+
+ /**
+ * Finds unsafe operators in an expression (at any level of nesting).
+ */
+ private static class UnsafeOperatorsFinder extends RexVisitorImpl<Void> {
+ // accounting for DeMorgan's law
+ boolean inNegation = false;
- typeSafeRex = rb.makeCall(((RexCall) typeSafeRex).getOperator(),
fixedPredElems);
+ protected UnsafeOperatorsFinder(boolean deep) {
+ super(deep);
}
- return typeSafeRex;
+ @Override
+ public Void visitCall(RexCall call) {
+ switch (call.getKind()) {
+ case OR:
+ if (inNegation) {
+ return super.visitCall(call);
+ } else {
+ throw Util.FoundOne.NULL;
Review comment:
I know it's an anti-pattern but it's anywhere in the codebase (even in
Calcite IIRC I have seen it), but it's better not to add another instance of it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 720032)
Time Spent: 1h (was: 50m)
> OOM due to recursive application of CBO rules
> ---------------------------------------------
>
> Key: HIVE-25758
> URL: https://issues.apache.org/jira/browse/HIVE-25758
> Project: Hive
> Issue Type: Bug
> Components: CBO, Query Planning
> Affects Versions: 4.0.0
> Reporter: Alessandro Solimando
> Assignee: Alessandro Solimando
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
>
> Reproducing query is as follows:
> {code:java}
> create table test1 (act_nbr string);
> create table test2 (month int);
> create table test3 (mth int, con_usd double);
> EXPLAIN
> SELECT c.month,
> d.con_usd
> FROM
> (SELECT
> cast(regexp_replace(substr(add_months(from_unixtime(unix_timestamp(),
> 'yyyy-MM-dd'), -1), 1, 7), '-', '') AS int) AS month
> FROM test1
> UNION ALL
> SELECT month
> FROM test2
> WHERE month = 202110) c
> JOIN test3 d ON c.month = d.mth; {code}
>
> Different plans are generated during the first CBO steps, last being:
> {noformat}
> 2021-12-01T08:28:08,598 DEBUG [a18191bb-3a2b-4193-9abf-4e37dd1996bb main]
> parse.CalcitePlanner: Plan after decorre
> lation:
> HiveProject(month=[$0], con_usd=[$2])
> HiveJoin(condition=[=($0, $1)], joinType=[inner], algorithm=[none],
> cost=[not available])
> HiveProject(month=[$0])
> HiveUnion(all=[true])
>
> HiveProject(month=[CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
> _UTF-16LE'yyyy-MM-d
> d':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1, 7),
> _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-
> 16LE", _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER])
> HiveTableScan(table=[[default, test1]], table:alias=[test1])
> HiveProject(month=[$0])
> HiveFilter(condition=[=($0, CAST(202110):INTEGER)])
> HiveTableScan(table=[[default, test2]], table:alias=[test2])
> HiveTableScan(table=[[default, test3]], table:alias=[d]){noformat}
>
> Then, the HEP planner will keep expanding the filter expression with
> redundant expressions, such as the following, where the identical CAST
> expression is present multiple times:
>
> {noformat}
> rel#118:HiveFilter.HIVE.[].any(input=HepRelVertex#39,condition=IN(CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
> _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1,
> 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE",
> _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER,
> CAST(regexp_replace(substr(add_months(FROM_UNIXTIME(UNIX_TIMESTAMP,
> _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE"), -1), 1,
> 7), _UTF-16LE'-':VARCHAR(2147483647) CHARACTER SET "UTF-16LE",
> _UTF-16LE'':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")):INTEGER,
> 202110)){noformat}
>
> The problem seems to come from a bad interaction of at least
> _HiveFilterProjectTransposeRule_ and
> {_}HiveJoinPushTransitivePredicatesRule{_}, possibly more.
> Most probably then UNION part can be removed and the reproducer be simplified
> even further.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)