[
https://issues.apache.org/jira/browse/HIVE-25843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17474527#comment-17474527
]
Marton Bod commented on HIVE-25843:
-----------------------------------
Pushed to master. Thanks [~pvary] for reviewing!
> Add flag to disable Iceberg FileIO config serialization
> -------------------------------------------------------
>
> Key: HIVE-25843
> URL: https://issues.apache.org/jira/browse/HIVE-25843
> Project: Hive
> Issue Type: Improvement
> Reporter: Marton Bod
> Assignee: Marton Bod
> Priority: Major
> Labels: pull-request-available
> Time Spent: 50m
> Remaining Estimate: 0h
>
> Hive serializes the Iceberg table object into each individual split. Since
> the FileIO is part of the Iceberg table and it has its own hadoop
> configuration, this configuration will be the dominant factor determining the
> size of the serialized split. In our tests we have found that due to this
> serialized config, iceberg splits are 15-20x larger than normal Hive splits
> (which led to OOM in some of our perf tests).
> This PR proposes to introduce a config which can turn off this config
> serialization, and let the deserializer-side fill out the config values
> instead (which works for Hive executors, since they have all the config
> values in hand). This can reduce the Iceberg split size by ~20x based on
> local tests.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)