[
https://issues.apache.org/jira/browse/HIVE-27050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Denys Kuzmenko resolved HIVE-27050.
-----------------------------------
Fix Version/s: 4.0.0-beta-1
Resolution: Fixed
> Iceberg: MOR: Restrict reducer extrapolation to contain number of small files
> being created
> -------------------------------------------------------------------------------------------
>
> Key: HIVE-27050
> URL: https://issues.apache.org/jira/browse/HIVE-27050
> Project: Hive
> Issue Type: Improvement
> Components: Iceberg integration
> Reporter: Rajesh Balamohan
> Assignee: Dmitriy Fingerman
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-beta-1
>
>
> Scenario:
> # Create a simple table in iceberg (MOR mode). e.g store_sales_delete_1
> # Insert some data into it.
> # Run an update statement as follows
> ## "update store_sales_delete_1 set ss_sold_time_sk=699060 where
> ss_sold_time_sk=69906"
> Hive estimates the number of reducers as "1". But due to
> "hive.tez.max.partition.factor" which defaults to "2.0", it will double the
> number of reducers.
> To put in perspective, it will create very small positional delete files
> spreading across different reducers. This will cause problems during reading,
> as all files should be opened for reading.
>
> # When iceberg MOR tables are involved in update/delete/merges, disable
> "hive.tez.max.partition.factor"; or set it to "1.0" irrespective of the user
> setting;
> # Have explicit logs for easier debugging; User shouldn't be confused on why
> the setting is not taking into effect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)