ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388097859
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean
generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0),
key.getTypeInfo(), "", false));
groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1),
key.getTypeInfo(), "", false));
- groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2),
key.getTypeInfo(), "", false));
+ groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2),
TypeInfoFactory.binaryTypeInfo, "", false));
GroupByOperator groupByOp =
(GroupByOperator)OperatorFactory.getAndMakeChild(
groupBy, new RowSchema(groupbyColInfos), selectOp);
groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
// Get the column names of the aggregations for reduce sink
- int colPos = 0;
ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
Map<String, ExprNodeDesc> columnExprMap = new HashMap<String,
ExprNodeDesc>();
- for (int i = 0; i < aggs.size() - 1; i++) {
- ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
- gbOutputNames.get(colPos), "", false);
+ ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+ for (int colPos = 0; colPos < aggs.size(); colPos++) {
+ TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+ ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo,
gbOutputNames.get(colPos), "", false);
rsValueCols.add(colExpr);
- columnExprMap.put(gbOutputNames.get(colPos), colExpr);
- colPos++;
- }
+ columnExprMap.put(Utilities.ReduceField.VALUE + "." +
gbOutputNames.get(colPos), colExpr);
- // Bloom Filter uses binary
- ExprNodeColumnDesc colExpr = new
ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
- gbOutputNames.get(colPos), "", false);
- rsValueCols.add(colExpr);
- columnExprMap.put(gbOutputNames.get(colPos), colExpr);
- colPos++;
+ ColumnInfo colInfo =
Review Comment:
(Please understand that I may be wrong as it has been a long time since I
worked on this issue. I'll check this issue again and leave you a comment if
any corrections are needed.)
RS operator forwards key-value pairs to child operator, and RS operators'
columns are separated into KEY and VALUE by prepend `ReduceField.KEY` and
`VALUE`.
According to our records, we encountered "cannot find filed _col0 from
[0:key, 1:value]" issue when we set correct ColumnExprMap to GBY operator. We
found that SEL operator generated by PEF tries to read `_col0`, but RS operator
only provides `VALUE._col0`. So we prepend the column schema as PEF refers to
it during SEL operator creation.
I can't remember why DPPOptimization works okay without
`ReduceField.KEY/VALUE`. But our record says that we checked that other RS
operator creator components also use `ReduceField.KEY/VALUE` when setting the
schema of RS operator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]