okumin commented on code in PR #4477:
URL: https://github.com/apache/hive/pull/4477#discussion_r1269422641
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/GenTezUtils.java:
##########
@@ -86,6 +108,13 @@ public static ReduceWork createReduceWork(
float maxPartitionFactor =
context.conf.getFloatVar(HiveConf.ConfVars.TEZ_MAX_PARTITION_FACTOR);
+
+ if (context.parseContext.getContext().getOperation() ==
Context.Operation.DELETE &&
+ isRestrictReducerExtrapolation(context)) {
+ LOG.debug("Overriding maxPartitionFactor to 1.0 to prevent creation of
small files after delete operation");
+ maxPartitionFactor = 1f;
Review Comment:
I quickly double-checked it works as expected.
ACID
```
$ beeline -e "
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (id INT) STORED AS ORC TBLPROPERTIES
('transactional'='true');
> INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
> "
...
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-20 12:52:55,637 Map 1: -/- Reducer 2: 0/1
INFO : 2023-07-20 12:52:57,161 Map 1: 0/1 Reducer 2: 0/1
```
Iceberg
```
$ beeline -e "
> DROP TABLE IF EXISTS test;
> CREATE TABLE test (id INT) STORED BY ICEBERG
TBLPROPERTIES('format-version'='2');
> INSERT INTO test VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
> "
...
$ beeline -e 'DELETE FROM test WHERE id = 5' --hiveconf
hive.server2.in.place.progress=false --hiveconf
hive.tez.auto.reducer.parallelism=true
...
INFO : 2023-07-20 12:55:56,104 Map 1: -/- Reducer 2: 0/1
INFO : 2023-07-20 12:55:58,337 Map 1: 0/1 Reducer 2: 0/1
```
##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLUtils.java:
##########
@@ -231,10 +231,17 @@ public static void
validateTableIsIceberg(org.apache.hadoop.hive.ql.metadata.Tab
}
public static boolean isIcebergTable(Table table) {
- return table.isNonNative() &&
- table.getStorageHandler().getType() == StorageHandlerTypes.ICEBERG;
+ return table.isNonNative() &&
+ ((table.getStorageHandler() != null &&
table.getStorageHandler().getType() == StorageHandlerTypes.ICEBERG) ||
+ isIcebergTableType(table.getTTable().getParameters()));
Review Comment:
Is there any case where StorageHandler#getType != ICEBERG but
isIcebergTableType = true?
##########
ql/src/java/org/apache/hadoop/hive/ql/ddl/DDLUtils.java:
##########
@@ -231,10 +231,17 @@ public static void
validateTableIsIceberg(org.apache.hadoop.hive.ql.metadata.Tab
}
public static boolean isIcebergTable(Table table) {
Review Comment:
Wow, I originally thought the change be available also for Delta Lake, Hudi,
etc, but looks like we hardcode many for Iceberg...
I will create a ticket to push this kind of logic to StorageHandler...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]