okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1266998178
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java:
##########
@@ -148,6 +148,14 @@ public void initialize(@Nullable Configuration
configuration, Properties serDePr
// TODO: remove once we have both Fanout and ClusteredWriter available:
HIVE-25948
HiveConf.setIntVar(configuration,
HiveConf.ConfVars.HIVEOPTSORTDYNAMICPARTITIONTHRESHOLD, 1);
HiveConf.setVar(configuration, HiveConf.ConfVars.DYNAMICPARTITIONINGMODE,
"nonstrict");
+
+ Context.Operation operation =
HiveCustomStorageHandlerUtils.getWriteOperation(configuration,
+ serDeProperties.getProperty(Catalogs.NAME));
+
+ if (operation != null) {
+ HiveConf.setFloatVar(configuration,
HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR, 1f);
Review Comment:
I personally think it is reasonable to explicitly inject the logic into
GenTezUtils or somewhere. One request is I'd like to make it pluggable because
other formats would hit the same issue. As far as I checked on my machine, Hive
ACID shares the problem. Note that these are my opinions and committers could
have different ideas, or they might think it is an expected behavior.
As for the parameter, I guess you tested it with 4.0.0-alpha-2 since the
param was merged recently. It enforces auto reduce parallelism.
```
$ beeline -e "
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (id INT) STORED BY ICEBERG
TBLPROPERTIES('format-version'='2');
> INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
> "
```
```
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-18 14:19:45,200 Map 1: 0(+1)/1 Reducer 2: 0/2
INFO : 2023-07-18 14:19:48,227 Map 1: 1/1 Reducer 2: 0(+1)/2
...
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true --hiveconf
hive.tez.auto.reducer.parallelism.min.threshold=0.0
...
INFO : 2023-07-18 14:20:23,730 Map 1: 0(+1)/1 Reducer 2: 0/2
INFO : 2023-07-18 14:20:27,271 Map 1: 1/1 Reducer 2: 0(+1)/1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]