[
https://issues.apache.org/jira/browse/HIVE-27050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749559#comment-17749559
]
Denys Kuzmenko commented on HIVE-27050:
---------------------------------------
merged to master.
thanks [~difin] for the patch and [~okumin] for the review!
> Iceberg: MOR: Restrict reducer extrapolation to contain number of small files
> being created
> -------------------------------------------------------------------------------------------
>
> Key: HIVE-27050
> URL: https://issues.apache.org/jira/browse/HIVE-27050
> Project: Hive
> Issue Type: Improvement
> Components: Iceberg integration
> Reporter: Rajesh Balamohan
> Assignee: Dmitriy Fingerman
> Priority: Major
> Labels: pull-request-available
>
> Scenario:
> # Create a simple table in iceberg (MOR mode). e.g store_sales_delete_1
> # Insert some data into it.
> # Run an update statement as follows
> ## "update store_sales_delete_1 set ss_sold_time_sk=699060 where
> ss_sold_time_sk=69906"
> Hive estimates the number of reducers as "1". But due to
> "hive.tez.max.partition.factor" which defaults to "2.0", it will double the
> number of reducers.
> To put in perspective, it will create very small positional delete files
> spreading across different reducers. This will cause problems during reading,
> as all files should be opened for reading.
>
> # When iceberg MOR tables are involved in update/delete/merges, disable
> "hive.tez.max.partition.factor"; or set it to "1.0" irrespective of the user
> setting;
> # Have explicit logs for easier debugging; User shouldn't be confused on why
> the setting is not taking into effect.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)