[
https://issues.apache.org/jira/browse/HIVE-25967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ádám Szita updated HIVE-25967:
------------------------------
Description:
This hack removes residual expressions from the file scan task just before
split serialization.
Residuals can sometime take up too much space in the payload causing Tez AM to
OOM.
Unfortunately Tez AM doesn't distribute splits in a streamed way, that is, it
serializes all splits for a job before sending them out to executors. Some
residuals may take ~ 1 MB in memory, multiplied with thousands of split could
kill the Tez AM JVM.
Until the streamed split distribution is implemented we will kick residuals out
of the split.
> Prevent residual expressions from getting serialized in Iceberg splits
> ----------------------------------------------------------------------
>
> Key: HIVE-25967
> URL: https://issues.apache.org/jira/browse/HIVE-25967
> Project: Hive
> Issue Type: Bug
> Reporter: Ádám Szita
> Assignee: Ádám Szita
> Priority: Major
> Labels: pull-request-available
> Time Spent: 40m
> Remaining Estimate: 0h
>
> This hack removes residual expressions from the file scan task just before
> split serialization.
> Residuals can sometime take up too much space in the payload causing Tez AM
> to OOM.
> Unfortunately Tez AM doesn't distribute splits in a streamed way, that is, it
> serializes all splits for a job before sending them out to executors. Some
> residuals may take ~ 1 MB in memory, multiplied with thousands of split could
> kill the Tez AM JVM.
> Until the streamed split distribution is implemented we will kick residuals
> out of the split.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)