[
https://issues.apache.org/jira/browse/HIVE-26110?focusedWorklogId=752721&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752721
]
ASF GitHub Bot logged work on HIVE-26110:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 05/Apr/22 09:00
Start Date: 05/Apr/22 09:00
Worklog Time Spent: 10m
Work Description: szlta commented on code in PR #3174:
URL: https://github.com/apache/hive/pull/3174#discussion_r842544398
##########
ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java:
##########
@@ -648,7 +648,12 @@ public ReduceSinkOperator getReduceSinkOp(List<Integer>
partitionPositions, List
ArrayList<ExprNodeDesc> partCols = Lists.newArrayList();
for (Function<List<ExprNodeDesc>, ExprNodeDesc> customSortExpr :
customSortExprs) {
- keyCols.add(customSortExpr.apply(allCols));
+ ExprNodeDesc colExpr = customSortExpr.apply(allCols);
+ // Custom sort expressions are marked as KEYs, which is required for
sorting the rows that are going for
+ // a particular reducer instance. They also need to be marked as
'partition' columns for MapReduce shuffle
+ // phase, in order to gather the same keys to the same reducer
instances.
+ keyCols.add(colExpr);
+ partCols.add(colExpr);
Review Comment:
Thx!
Issue Time Tracking
-------------------
Worklog Id: (was: 752721)
Time Spent: 50m (was: 40m)
> bulk insert into partitioned table creates lots of files in iceberg
> -------------------------------------------------------------------
>
> Key: HIVE-26110
> URL: https://issues.apache.org/jira/browse/HIVE-26110
> Project: Hive
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> For e.g, create web_returns table in tpcds in iceberg format and try to copy
> over data from regular table. More like "insert into web_returns_iceberg as
> select * from web_returns".
> This inserts the data correctly, however there are lot of files present in
> each partition. IMO, dynamic sort optimisation isn't working fine and this
> causes records not to be grouped in the final phase.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)