Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

via GitHub Thu, 09 Nov 2023 06:36:48 -0800


ngsg commented on code in PR #4043:
URL: https://github.com/apache/hive/pull/4043#discussion_r1388097859



##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java:
##########
@@ -678,38 +678,34 @@ private boolean 
generateSemiJoinOperatorPlan(DynamicListContext ctx, ParseContex
     ArrayList<ColumnInfo> groupbyColInfos = new ArrayList<ColumnInfo>();
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(0), 
key.getTypeInfo(), "", false));
     groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(1), 
key.getTypeInfo(), "", false));
-    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), 
key.getTypeInfo(), "", false));
+    groupbyColInfos.add(new ColumnInfo(gbOutputNames.get(2), 
TypeInfoFactory.binaryTypeInfo, "", false));
 
     GroupByOperator groupByOp = 
(GroupByOperator)OperatorFactory.getAndMakeChild(
             groupBy, new RowSchema(groupbyColInfos), selectOp);
 
     groupByOp.setColumnExprMap(new HashMap<String, ExprNodeDesc>());
 
     // Get the column names of the aggregations for reduce sink
-    int colPos = 0;
     ArrayList<ExprNodeDesc> rsValueCols = new ArrayList<ExprNodeDesc>();
     Map<String, ExprNodeDesc> columnExprMap = new HashMap<String, 
ExprNodeDesc>();
-    for (int i = 0; i < aggs.size() - 1; i++) {
-      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(key.getTypeInfo(),
-              gbOutputNames.get(colPos), "", false);
+    ArrayList<ColumnInfo> rsColInfos = new ArrayList<>();
+    for (int colPos = 0; colPos < aggs.size(); colPos++) {
+      TypeInfo typInfo = groupbyColInfos.get(colPos).getType();
+      ExprNodeColumnDesc colExpr = new ExprNodeColumnDesc(typInfo, 
gbOutputNames.get(colPos), "", false);
       rsValueCols.add(colExpr);
-      columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-      colPos++;
-    }
+      columnExprMap.put(Utilities.ReduceField.VALUE + "." + 
gbOutputNames.get(colPos), colExpr);
 
-    // Bloom Filter uses binary
-    ExprNodeColumnDesc colExpr = new 
ExprNodeColumnDesc(TypeInfoFactory.binaryTypeInfo,
-        gbOutputNames.get(colPos), "", false);
-    rsValueCols.add(colExpr);
-    columnExprMap.put(gbOutputNames.get(colPos), colExpr);
-    colPos++;
+      ColumnInfo colInfo =

Review Comment:
   (Please understand that I may be wrong as it has been a long time since I 
worked on this issue. I'll check this issue again and leave you a comment if 
any corrections are needed.)
   RS operator forwards key-value pairs to child operator, and RS operators' 
columns are separated into KEY and VALUE by prepend `ReduceField.KEY` and 
`VALUE`.
   According to our records, we encountered "cannot find filed _col0 from 
[0:key, 1:value]" issue when we set correct ColumnExprMap to GBY operator. We 
found that SEL operator generated by PEF tries to read `_col0`, but RS operator 
only provides `VALUE._col0`. So we prepend the column schema as PEF refers to 
it during SEL operator creation.
   I can't remember why DPPOptimization works okay without 
`ReduceField.KEY/VALUE`. But our record says that we checked that other RS 
operator creator components also use `ReduceField.KEY/VALUE` when setting the 
schema of RS operator.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HIVE-27006: Fix ParallelEdgeFixer [hive]

Reply via email to