szlta commented on code in PR #3174:
URL: https://github.com/apache/hive/pull/3174#discussion_r842445223


##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java:
##########
@@ -648,7 +648,12 @@ public ReduceSinkOperator getReduceSinkOp(List<Integer> 
partitionPositions, List
       ArrayList<ExprNodeDesc> partCols = Lists.newArrayList();
 
       for (Function<List<ExprNodeDesc>, ExprNodeDesc> customSortExpr : 
customSortExprs) {
-        keyCols.add(customSortExpr.apply(allCols));
+        ExprNodeDesc colExpr = customSortExpr.apply(allCols);
+        // Custom sort expressions are marked as KEYs, which is required for 
sorting the rows that are going for
+        // a particular reducer instance. They also need to be marked as 
'partition' columns for MapReduce shuffle
+        // phase, in order to gather the same keys to the same reducer 
instances.
+        keyCols.add(colExpr);
+        partCols.add(colExpr);

Review Comment:
   If customSortExprs are present, then we can be sure that partitionPositions 
are empty as per 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L592-L596
   As for dpColNames - I'm not sure why it would matter? With Iceberg tables 
the table schema already contains the partition columns too, it's just that 
Hive doesn't think of these as partition columns, rather just regular columns.
   I think the schema should be fine, all columns will serve as VALUE (with 
Iceberg we want to write out the partition values into the file too, as in some 
cases the spec can have a non-identity type of partition transform) plus the 
ones identified by customSortExpr will be added as KEY for sorting purposes 
(only) additionally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to