okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1264613366


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##########
@@ -148,6 +148,14 @@ public void initialize(@Nullable Configuration 
configuration, Properties serDePr
     // TODO: remove once we have both Fanout and ClusteredWriter available: 
HIVE-25948
     HiveConf.setIntVar(configuration, 
HiveConf.ConfVars.HIVEOPTSORTDYNAMICPARTITIONTHRESHOLD, 1);
     HiveConf.setVar(configuration, HiveConf.ConfVars.DYNAMICPARTITIONINGMODE, 
"nonstrict");
+
+    Context.Operation operation = 
HiveCustomStorageHandlerUtils.getWriteOperation(configuration,
+            serDeProperties.getProperty(Catalogs.NAME));
+
+    if (operation != null) {
+      HiveConf.setFloatVar(configuration, 
HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR, 1f);

Review Comment:
   I tested how this works as I am not confident.
   
   ## Test queries
   
   ### Prep
   
   ```
   beeline -e "
   DROP TABLE IF EXISTS test;
   CREATE TABLE test (id INT) STORED BY ICEBERG 
TBLPROPERTIES('format-version'='2');
   INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
   "
   ```
   
   ### DELETE
   
   ```
   beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ```
   
   ## Test result
   
   ### The original version
   
   
[This](https://github.com/okumin/hive/commit/59b84b9d7835f97b5e9df872ef00986678498976)
 is the tested revision. Two reducers are set up and launched as reported.
   
   ```
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-16 04:55:39,740      Map 1: 0(+1)/1  Reducer 2: 0/2  
   INFO  : 2023-07-16 04:55:42,259      Map 1: 1/1      Reducer 2: 0(+1)/2
   ```
   
   ```
   $ ./bin/logs hive-hiveserver2-655b558bb-gzwh8 | grep HIVE-27050 | tail -n 1
   hive-hiveserver2-655b558bb-gzwh8: 2023-07-16T04:55:31,760  INFO 
[f3ecdcb4-f45b-405f-986e-0dbf525c7c87 HiveServer2-Handler-Pool: Thread-61] 
parse.GenTezUtils: HIVE-27050: max partition factor=2.0, max partition=2
   ```
   
   ### The first patch
   
   
[This](https://github.com/okumin/hive/commit/a19ee2b5536539d066ecf77c369bc0f7bc85a2ca)
 is the tested revision. One reducer is set up and launched.
   
   ```
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-16 05:22:30,905      Map 1: 0/1      Reducer 2: 0/1  
   INFO  : 2023-07-16 05:22:35,454      Map 1: 0(+1)/1  Reducer 2: 0/1
   ```
   
   ```
   $ ./bin/logs hive-hiveserver2-75bc6bc94c-bhxxc | grep HIVE-27050 | tail -n 1
   hive-hiveserver2-75bc6bc94c-bhxxc: 2023-07-16T05:22:23,324  INFO 
[18f9ace5-2016-4430-8155-ae95c254148b HiveServer2-Handler-Pool: Thread-61] 
parse.GenTezUtils: HIVE-27050: max partition factor=1.0, max partition=1
   ```
   
   ### The second(current) path
   
   
[This](https://github.com/okumin/hive/commit/0e149323aa07b661a0c8eb259fa514bcb72c02a9)
 is the tested revision. It is unlikely to be working as expected.
   
   ```
   $ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf 
hive.server2.in.place.progress=false --hiveconf 
hive.tez.auto.reducer.parallelism=true
   ...
   INFO  : 2023-07-16 05:58:27,235      Map 1: 1/1      Reducer 2: 0/2  
   INFO  : 2023-07-16 05:58:27,741      Map 1: 1/1      Reducer 2: 0(+1)/2
   ```
   
   ```
   $ ./bin/logs hive-hiveserver2-5876b658cd-8zxwp | grep HIVE-27050 | tail -n 3
   hive-hiveserver2-5876b658cd-8zxwp: 2023-07-16T05:58:13,110  INFO 
[62373162-813c-409b-834d-335f2f72efc1 HiveServer2-Handler-Pool: Thread-61] 
parse.GenTezUtils: HIVE-27050: max partition factor=2.0, max partition=2
   hive-hiveserver2-5876b658cd-8zxwp: 2023-07-16T05:58:13,112  INFO 
[62373162-813c-409b-834d-335f2f72efc1 HiveServer2-Handler-Pool: Thread-61] 
hive.HiveIcebergSerDe: HIVE-27050: operation: null
   hive-hiveserver2-5876b658cd-8zxwp: 2023-07-16T05:58:28,411  INFO 
[HiveServer2-Background-Pool: Thread-111] hive.HiveIcebergSerDe: HIVE-27050: 
operation: null
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to