[ 
https://issues.apache.org/jira/browse/HIVE-27050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17749559#comment-17749559
 ] 

Denys Kuzmenko commented on HIVE-27050:
---------------------------------------

merged to master.
thanks [~difin] for the patch and [~okumin] for the review!

> Iceberg: MOR: Restrict reducer extrapolation to contain number of small files 
> being created
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27050
>                 URL: https://issues.apache.org/jira/browse/HIVE-27050
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: Rajesh Balamohan
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>              Labels: pull-request-available
>
> Scenario:
>  # Create a simple table in iceberg (MOR mode). e.g store_sales_delete_1
>  # Insert some data into it. 
>  # Run an update statement as follows
>  ## "update  store_sales_delete_1 set ss_sold_time_sk=699060 where 
> ss_sold_time_sk=69906"
> Hive estimates the number of reducers as "1". But due to 
> "hive.tez.max.partition.factor" which defaults to "2.0", it will double the 
> number of reducers.
> To put in perspective, it will create very small positional delete files 
> spreading across different reducers. This will cause problems during reading, 
> as all files should be opened for reading.
>  
>  # When iceberg MOR tables are involved in update/delete/merges, disable 
> "hive.tez.max.partition.factor"; or set it to "1.0" irrespective of the user 
> setting;
>  # Have explicit logs for easier debugging; User shouldn't be confused on why 
> the setting is not taking into effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to