[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #3761: HIVE-26737: Subquery returning wrong results when database has materialized views

GitBox Thu, 01 Dec 2022 17:55:03 -0800


scarlin-cloudera commented on code in PR #3761:
URL: https://github.com/apache/hive/pull/3761#discussion_r1037682486



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java:
##########
@@ -104,44 +106,43 @@ private HiveSubQueryRemoveRule(RelOptRuleOperand operand, 
String description, Hi
     // if subquery is in FILTER
     if (relNode instanceof HiveFilter) {
       final HiveFilter filter = call.rel(0);
-      final RexSubQuery e = RexUtil.SubQueryFinder.find(filter.getCondition());
-      assert e != null;
-
-      final RelOptUtil.Logic logic =
-          LogicVisitor.find(RelOptUtil.Logic.TRUE, 
ImmutableList.of(filter.getCondition()), e);
+      // Since there is a RexSubQuery, there should be a HiveCorrelationInfo
+      // in the RelNode
+      Preconditions.checkState(filter.getCorrelationInfos().size() > 0);
+      HiveCorrelationInfo correlationInfo = 
filter.getCorrelationInfos().get(0);
+      final RelOptUtil.Logic logic = LogicVisitor.find(RelOptUtil.Logic.TRUE,
+          ImmutableList.of(filter.getCondition()), 
correlationInfo.rexSubQuery);
       builder.push(filter.getInput());
       final int fieldCount = builder.peek().getRowType().getFieldCount();
 
-      SubqueryConf subqueryConfig = 
filter.getCluster().getPlanner().getContext().unwrap(SubqueryConf.class);
-      boolean isCorrScalarQuery = 
subqueryConfig.getCorrScalarRexSQWithAgg().contains(e.rel);
+      boolean isCorrScalarQuery = correlationInfo.isCorrScalarQuery();
 
       final RexNode target =
-          apply(call.getMetadataQuery(), e, HiveFilter.getVariablesSet(e), 
logic, builder, 1,
-              fieldCount, isCorrScalarQuery);
-      final RexShuttle shuttle = new ReplaceSubQueryShuttle(e, target);
+          apply(call.getMetadataQuery(), correlationInfo.rexSubQuery,
+              correlationInfo.correlationIds, logic, builder, 1, fieldCount, 
isCorrScalarQuery);
+      final RexShuttle shuttle = new 
ReplaceSubQueryShuttle(correlationInfo.rexSubQuery, target);
       builder.filter(shuttle.apply(filter.getCondition()));
       builder.project(fields(builder, filter.getRowType().getFieldCount()));
       RelNode newRel = builder.build();
       call.transformTo(newRel);
     } else if (relNode instanceof HiveProject) {
-      // if subquery is in PROJECT
       final HiveProject project = call.rel(0);
-      final RexSubQuery e = RexUtil.SubQueryFinder.find(project.getProjects());
-      assert e != null;
+      // Since there is a RexSubQuery, there should be a HiveCorrelationInfo
+      // in the RelNode
+      Preconditions.checkState(project.getCorrelationInfos().size() > 0);
+      HiveCorrelationInfo correlationInfo = 
project.getCorrelationInfos().get(0);

Review Comment:
   Gonna check this tomorrow, but this is what I think is happening...
   
   I think the rule just happens to get hit twice.  After the first correlation 
is grabbed, it creates a new filter with one less CorrelationInfo.  The rule 
gets triggered again and processes the second CorrelationInfo.
   
   On my first pass, I did change this by putting in a for loop.  I had some 
changes in the plans when I tried that.  It's probably cleaner and more optimal 
to do it in one for loop, but there are other changes happening in the 
RelNodes.  Since the previous code didn't have a for loop, there shouldn't be 
any performance degradation by not including it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #3761: HIVE-26737: Subquery returning wrong results when database has materialized views

Reply via email to