Rajesh Balamohan created HIVE-27050:
---------------------------------------
Summary: Iceberg: MOR: Restrict reducer extrapolation to contain
number of small files being created
Key: HIVE-27050
URL: https://issues.apache.org/jira/browse/HIVE-27050
Project: Hive
Issue Type: Improvement
Components: Iceberg integration
Reporter: Rajesh Balamohan
Scenario:
# Create a simple table in iceberg (MOR mode). e.g store_sales_delete_1
# Insert some data into it.
# Run an update statement as follows
## "update store_sales_delete_1 set ss_sold_time_sk=699060 where
ss_sold_time_sk=69906"
Hive estimates the number of reducers as "1". But due to
"hive.tez.max.partition.factor" which defaults to "2.0", it will double the
number of reducers.
To put in perspective, it will create very small positional delete files
spreading across different reducers. This will cause problems during reading,
as all files should be opened for reading.
# When iceberg MOR tables are involved in update/delete/merges, disable
"hive.tez.max.partition.factor"; or set it to "1.0" irrespective of the user
setting;
# Have explicit logs for easier debugging; User shouldn't be confused on why
the setting is not taking into effect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)