scarlin-cloudera commented on code in PR #3761:
URL: https://github.com/apache/hive/pull/3761#discussion_r1037682486
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveSubQueryRemoveRule.java:
##########
@@ -104,44 +106,43 @@ private HiveSubQueryRemoveRule(RelOptRuleOperand operand,
String description, Hi
// if subquery is in FILTER
if (relNode instanceof HiveFilter) {
final HiveFilter filter = call.rel(0);
- final RexSubQuery e = RexUtil.SubQueryFinder.find(filter.getCondition());
- assert e != null;
-
- final RelOptUtil.Logic logic =
- LogicVisitor.find(RelOptUtil.Logic.TRUE,
ImmutableList.of(filter.getCondition()), e);
+ // Since there is a RexSubQuery, there should be a HiveCorrelationInfo
+ // in the RelNode
+ Preconditions.checkState(filter.getCorrelationInfos().size() > 0);
+ HiveCorrelationInfo correlationInfo =
filter.getCorrelationInfos().get(0);
+ final RelOptUtil.Logic logic = LogicVisitor.find(RelOptUtil.Logic.TRUE,
+ ImmutableList.of(filter.getCondition()),
correlationInfo.rexSubQuery);
builder.push(filter.getInput());
final int fieldCount = builder.peek().getRowType().getFieldCount();
- SubqueryConf subqueryConfig =
filter.getCluster().getPlanner().getContext().unwrap(SubqueryConf.class);
- boolean isCorrScalarQuery =
subqueryConfig.getCorrScalarRexSQWithAgg().contains(e.rel);
+ boolean isCorrScalarQuery = correlationInfo.isCorrScalarQuery();
final RexNode target =
- apply(call.getMetadataQuery(), e, HiveFilter.getVariablesSet(e),
logic, builder, 1,
- fieldCount, isCorrScalarQuery);
- final RexShuttle shuttle = new ReplaceSubQueryShuttle(e, target);
+ apply(call.getMetadataQuery(), correlationInfo.rexSubQuery,
+ correlationInfo.correlationIds, logic, builder, 1, fieldCount,
isCorrScalarQuery);
+ final RexShuttle shuttle = new
ReplaceSubQueryShuttle(correlationInfo.rexSubQuery, target);
builder.filter(shuttle.apply(filter.getCondition()));
builder.project(fields(builder, filter.getRowType().getFieldCount()));
RelNode newRel = builder.build();
call.transformTo(newRel);
} else if (relNode instanceof HiveProject) {
- // if subquery is in PROJECT
final HiveProject project = call.rel(0);
- final RexSubQuery e = RexUtil.SubQueryFinder.find(project.getProjects());
- assert e != null;
+ // Since there is a RexSubQuery, there should be a HiveCorrelationInfo
+ // in the RelNode
+ Preconditions.checkState(project.getCorrelationInfos().size() > 0);
+ HiveCorrelationInfo correlationInfo =
project.getCorrelationInfos().get(0);
Review Comment:
Gonna check this tomorrow, but this is what I think is happening...
I think the rule just happens to get hit twice. After the first correlation
is grabbed, it creates a new filter with one less CorrelationInfo. The rule
gets triggered again and processes the second CorrelationInfo.
On my first pass, I did change this by putting in a for loop. I had some
changes in the plans when I tried that. It's probably cleaner and more optimal
to do it in one for loop, but there are other changes happening in the
RelNodes. Since the previous code didn't have a for loop, there shouldn't be
any performance degradation by not including it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]