okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1264613366
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##########
@@ -148,6 +148,14 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
// TODO: remove once we have both Fanout and ClusteredWriter available:
HIVE-25948
HiveConf.setIntVar(configuration,
HiveConf.ConfVars.HIVEOPTSORTDYNAMICPARTITIONTHRESHOLD, 1);
HiveConf.setVar(configuration, HiveConf.ConfVars.DYNAMICPARTITIONINGMODE,
"nonstrict");
+
+ Context.Operation operation =
HiveCustomStorageHandlerUtils.getWriteOperation(configuration,
+ serDeProperties.getProperty(Catalogs.NAME));
+
+ if (operation != null) {
+ HiveConf.setFloatVar(configuration,
HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR, 1f);
Review Comment:
I tested how this works as I am not confident.
## Test queries
### Prep
```
beeline -e "
DROP TABLE IF EXISTS test;
CREATE TABLE test (id INT) STORED BY ICEBERG
TBLPROPERTIES('format-version'='2');
INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
"
```
### DELETE
```
beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
```
## Test result
### The original version
[This](https://github.com/okumin/hive/commit/59b84b9d7835f97b5e9df872ef00986678498976)
is the tested revision. Two reducers are set up and launched as reported.
```
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-16 04:55:39,740 Map 1: 0(+1)/1 Reducer 2: 0/2
INFO : 2023-07-16 04:55:42,259 Map 1: 1/1 Reducer 2: 0(+1)/2
```
```
$ ./bin/logs hive-hiveserver2-655b558bb-gzwh8 | grep HIVE-27050 | tail -n 1
hive-hiveserver2-655b558bb-gzwh8: 2023-07-16T04:55:31,760 INFO
[f3ecdcb4-f45b-405f-986e-0dbf525c7c87 HiveServer2-Handler-Pool: Thread-61]
parse.GenTezUtils: HIVE-27050: max partition factor=2.0, max partition=2
```
### The first patch
[This](https://github.com/okumin/hive/commit/a19ee2b5536539d066ecf77c369bc0f7bc85a2ca)
is the tested revision. One reducer is set up and launched.
```
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-16 05:22:30,905 Map 1: 0/1 Reducer 2: 0/1
INFO : 2023-07-16 05:22:35,454 Map 1: 0(+1)/1 Reducer 2: 0/1
```
```
$ ./bin/logs hive-hiveserver2-75bc6bc94c-bhxxc | grep HIVE-27050 | tail -n 1
hive-hiveserver2-75bc6bc94c-bhxxc: 2023-07-16T05:22:23,324 INFO
[18f9ace5-2016-4430-8155-ae95c254148b HiveServer2-Handler-Pool: Thread-61]
parse.GenTezUtils: HIVE-27050: max partition factor=1.0, max partition=1
```
### The second(current) path
[This](https://github.com/okumin/hive/commit/0e149323aa07b661a0c8eb259fa514bcb72c02a9)
is the tested revision. It is unlikely to be working as expected.
```
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-16 05:58:27,235 Map 1: 1/1 Reducer 2: 0/2
INFO : 2023-07-16 05:58:27,741 Map 1: 1/1 Reducer 2: 0(+1)/2
```
```
$ ./bin/logs hive-hiveserver2-5876b658cd-8zxwp | grep HIVE-27050 | tail -n 3
hive-hiveserver2-5876b658cd-8zxwp: 2023-07-16T05:58:13,110 INFO
[62373162-813c-409b-834d-335f2f72efc1 HiveServer2-Handler-Pool: Thread-61]
parse.GenTezUtils: HIVE-27050: max partition factor=2.0, max partition=2
hive-hiveserver2-5876b658cd-8zxwp: 2023-07-16T05:58:13,112 INFO
[62373162-813c-409b-834d-335f2f72efc1 HiveServer2-Handler-Pool: Thread-61]
hive.HiveIcebergSerDe: HIVE-27050: operation: null
hive-hiveserver2-5876b658cd-8zxwp: 2023-07-16T05:58:28,411 INFO
[HiveServer2-Background-Pool: Thread-111] hive.HiveIcebergSerDe: HIVE-27050:
operation: null
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]