[ 
https://issues.apache.org/jira/browse/HIVE-25967?focusedWorklogId=752945&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-752945
 ]

ASF GitHub Bot logged work on HIVE-25967:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 05/Apr/22 15:26
            Start Date: 05/Apr/22 15:26
    Worklog Time Spent: 10m 
      Work Description: szlta opened a new pull request, #3178:
URL: https://github.com/apache/hive/pull/3178

   I originally thought that we only need the hack whenever residuals are 
present, so I added this condition:
   
   
https://github.com/apache/hive/commit/1aa6ce800004798e78ea53c3bec2beedb5f55b6c#diff-9487d7073613adf5132783cf905ea72164eb4c19461c50e5ce3cd735bb5704a3R127
   
   What I didn't know is that in some cases the residuals() invocation may end 
up returning True while the expression is still some longer construct. The 
residuals() invocation actually evaluates said expression against the partition 
information found in the base scan file task... Because of this the residuals 
are left untouched and will cause OOM.. 
   
   This addendum removes aforementioned unnecessary condition




Issue Time Tracking
-------------------

    Worklog Id:     (was: 752945)
    Time Spent: 50m  (was: 40m)

> Prevent residual expressions from getting serialized in Iceberg splits
> ----------------------------------------------------------------------
>
>                 Key: HIVE-25967
>                 URL: https://issues.apache.org/jira/browse/HIVE-25967
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> This hack removes residual expressions from the file scan task just before 
> split serialization.
> Residuals can sometime take up too much space in the payload causing Tez AM 
> to OOM.
> Unfortunately Tez AM doesn't distribute splits in a streamed way, that is, it 
> serializes all splits for a job before sending them out to executors. Some 
> residuals may take ~ 1 MB in memory, multiplied with thousands of split could 
> kill the Tez AM JVM.
> Until the streamed split distribution is implemented we will kick residuals 
> out of the split.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to