okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1269422641
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java:
##########
@@ -86,6 +108,13 @@ public static ReduceWork createReduceWork(
float maxPartitionFactor =
context.conf.getFloatVar(HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR);
+
+ if (context.parseContext.getContext().getOperation() ==
Context.Operation.DELETE &&
+ isRestrictReducerExtrapolation(context)) {
+ LOG.debug("Overriding maxPartitionFactor to 1.0 to prevent creation of
small files after delete operation");
+ maxPartitionFactor = 1f;
Review Comment:
I quickly double-checked it works as expected. Let's wait for the comment of
committers about this approach.
ACID
```
$ beeline -e "
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (id INT) STORED AS ORC TBLPROPERTIES
('transactional'='true');
> INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
> "
...
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-20 12:52:55,637 Map 1: -/- Reducer 2: 0/1
INFO : 2023-07-20 12:52:57,161 Map 1: 0/1 Reducer 2: 0/1
```
Iceberg
```
$ beeline -e "
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (id INT) STORED BY ICEBERG
TBLPROPERTIES('format-version'='2');
> INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
> "
...
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-20 12:55:56,104 Map 1: -/- Reducer 2: 0/1
INFO : 2023-07-20 12:55:58,337 Map 1: 0/1 Reducer 2: 0/1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]