[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #3761: HIVE-26737: Subquery returning wrong results when database has materialized views

GitBox Thu, 01 Dec 2022 16:29:01 -0800


scarlin-cloudera commented on code in PR #3761:
URL: https://github.com/apache/hive/pull/3761#discussion_r1037682486



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java:
##########
@@ -104,44 +106,43 @@ private HiveSubQueryRemoveRule(RelOptRuleOperand operand, 
String description, Hi
     // if subquery is in FILTER
     if (relNode instanceof HiveFilter) {
       final HiveFilter filter = call.rel(0);
-      final RexSubQuery e = RexUtil.SubQueryFinder.find(filter.getCondition());
-      assert e != null;
-
-      final RelOptUtil.Logic logic =
-          LogicVisitor.find(RelOptUtil.Logic.TRUE, 
ImmutableList.of(filter.getCondition()), e);
+      // Since there is a RexSubQuery, there should be a HiveCorrelationInfo
+      // in the RelNode
+      Preconditions.checkState(filter.getCorrelationInfos().size() > 0);
+      HiveCorrelationInfo correlationInfo = 
filter.getCorrelationInfos().get(0);
+      final RelOptUtil.Logic logic = LogicVisitor.find(RelOptUtil.Logic.TRUE,
+          ImmutableList.of(filter.getCondition()), 
correlationInfo.rexSubQuery);
       builder.push(filter.getInput());
       final int fieldCount = builder.peek().getRowType().getFieldCount();
 
-      SubqueryConf subqueryConfig = 
filter.getCluster().getPlanner().getContext().unwrap(SubqueryConf.class);
-      boolean isCorrScalarQuery = 
subqueryConfig.getCorrScalarRexSQWithAgg().contains(e.rel);
+      boolean isCorrScalarQuery = correlationInfo.isCorrScalarQuery();
 
       final RexNode target =
-          apply(call.getMetadataQuery(), e, HiveFilter.getVariablesSet(e), 
logic, builder, 1,
-              fieldCount, isCorrScalarQuery);
-      final RexShuttle shuttle = new ReplaceSubQueryShuttle(e, target);
+          apply(call.getMetadataQuery(), correlationInfo.rexSubQuery,
+              correlationInfo.correlationIds, logic, builder, 1, fieldCount, 
isCorrScalarQuery);
+      final RexShuttle shuttle = new 
ReplaceSubQueryShuttle(correlationInfo.rexSubQuery, target);
       builder.filter(shuttle.apply(filter.getCondition()));
       builder.project(fields(builder, filter.getRowType().getFieldCount()));
       RelNode newRel = builder.build();
       call.transformTo(newRel);
     } else if (relNode instanceof HiveProject) {
-      // if subquery is in PROJECT
       final HiveProject project = call.rel(0);
-      final RexSubQuery e = RexUtil.SubQueryFinder.find(project.getProjects());
-      assert e != null;
+      // Since there is a RexSubQuery, there should be a HiveCorrelationInfo
+      // in the RelNode
+      Preconditions.checkState(project.getCorrelationInfos().size() > 0);
+      HiveCorrelationInfo correlationInfo = 
project.getCorrelationInfos().get(0);

Review Comment:
   Gonna check this tomorrow, but this is what I think is happening...
   
   I think the rule just happens to get hit twice.  After the first correlation 
is grabbed, it creates a new filter with one less CorrelationInfo.  The rule 
gets triggered again and processes the second CorrelationInfo.
   
   On my first pass, I did change this by putting in a for loop.  I had some 
changes in the plans when I tried that.  And I also figured that this makes the 
code review cleaner by not adding a for loop?  I can try it again though.  At 
least if I try it now, the next code review would be only the change of adding 
a for loop.  Just let me know and I'll try this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hive] scarlin-cloudera commented on a diff in pull request #3761: HIVE-26737: Subquery returning wrong results when database has materialized views

Reply via email to