okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1269422641


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java:
##########
@@ -86,6 +108,13 @@ public static ReduceWork createReduceWork(
 
     float maxPartitionFactor =
         context.conf.getFloatVar(HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR);
+    
+    if (context.parseContext.getContext().getOperation() == 
Context.Operation.DELETE &&
+            isRestrictReducerExtrapolation(context)) {
+      LOG.debug("Overriding maxPartitionFactor to 1.0 to prevent creation of 
small files after delete operation");
+      maxPartitionFactor = 1f;

Review Comment:
   I quickly double-checked it works as expected. Let's wait for the comment of 
committers about this approach.
   
   ACID
   
   ```
   $ beeline -e "
   > DROP TABLE IF EXISTS test;
   > CREATE TABLE test (id INT) STORED AS ORC TBLPROPERTIES 
('transactional'='true');
   > INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
   > "
   ...
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-20 12:52:55,637      Map 1: -/-      Reducer 2: 0/1  
   INFO  : 2023-07-20 12:52:57,161      Map 1: 0/1      Reducer 2: 0/1
   ```
   
   Iceberg
   ```
   $ beeline -e "
   > DROP TABLE IF EXISTS test;
   > CREATE TABLE test (id INT) STORED BY ICEBERG 
TBLPROPERTIES('format-version'='2');
   > INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
   > "
   ...
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-20 12:55:56,104      Map 1: -/-      Reducer 2: 0/1  
   INFO  : 2023-07-20 12:55:58,337      Map 1: 0/1      Reducer 2: 0/1
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to