[ 
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marton Bod resolved HIVE-25843.
-------------------------------
    Resolution: Fixed

> Add flag to disable Iceberg FileIO config serialization
> -------------------------------------------------------
>
>                 Key: HIVE-25843
>                 URL: https://issues.apache.org/jira/browse/HIVE-25843
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Marton Bod
>            Assignee: Marton Bod
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since 
> the FileIO is part of the Iceberg table and it has its own hadoop 
> configuration, this configuration will be the dominant factor determining the 
> size of the serialized split. In our tests we have found that due to this 
> serialized config, iceberg splits are 15-20x larger than normal Hive splits 
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config 
> serialization, and let the deserializer-side fill out the config values 
> instead (which works for Hive executors, since they have all the config 
> values in hand). This can reduce the Iceberg split size by ~20x based on 
> local tests.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to