[ 
https://issues.apache.org/jira/browse/HIVE-27050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denys Kuzmenko resolved HIVE-27050.
-----------------------------------
    Fix Version/s: 4.0.0-beta-1
       Resolution: Fixed

> Iceberg: MOR: Restrict reducer extrapolation to contain number of small files 
> being created
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-27050
>                 URL: https://issues.apache.org/jira/browse/HIVE-27050
>             Project: Hive
>          Issue Type: Improvement
>          Components: Iceberg integration
>            Reporter: Rajesh Balamohan
>            Assignee: Dmitriy Fingerman
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0-beta-1
>
>
> Scenario:
>  # Create a simple table in iceberg (MOR mode). e.g store_sales_delete_1
>  # Insert some data into it. 
>  # Run an update statement as follows
>  ## "update  store_sales_delete_1 set ss_sold_time_sk=699060 where 
> ss_sold_time_sk=69906"
> Hive estimates the number of reducers as "1". But due to 
> "hive.tez.max.partition.factor" which defaults to "2.0", it will double the 
> number of reducers.
> To put in perspective, it will create very small positional delete files 
> spreading across different reducers. This will cause problems during reading, 
> as all files should be opened for reading.
>  
>  # When iceberg MOR tables are involved in update/delete/merges, disable 
> "hive.tez.max.partition.factor"; or set it to "1.0" irrespective of the user 
> setting;
>  # Have explicit logs for easier debugging; User shouldn't be confused on why 
> the setting is not taking into effect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to