[ 
https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Franck Tago updated SPARK-44759:
--------------------------------
    Component/s: Deploy
                 Spark Core

> Do not combine multiple Generate operators in the same WholeStageCodeGen node 
> because it can  easily cause OOM failures if arrays are relatively large
> ------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-44759
>                 URL: https://issues.apache.org/jira/browse/SPARK-44759
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy, Optimizer, Spark Core
>    Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 
> 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1
>            Reporter: Franck Tago
>            Priority: Major
>         Attachments: image-2023-08-10-09-27-24-124.png, 
> image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, 
> image-2023-08-10-09-33-47-788.png, 
> wholestagecodegen_wc1_debug_wholecodegen_passed
>
>
> This is an issue since the WSCG  implementation of the generate node. 
> Because WSCG compute rows in batches , the combination of WSCG and the 
> explode operation consume a lot of the dedicated executor memory. This is 
> even more true when the WSCG node contains multiple explode nodes. This is 
> the case when flattening a nested array.
> The generate node used to flatten array generally  produces an amount of 
> output rows that is significantly higher than the input rows.
> the number of output rows generated is even drastically higher when 
> flattening a nested array .
> When we combine more that 1 generate node in the same WholeStageCodeGen  
> node, we run  a high risk of running out of memory for multiple reasons. 
> 1- As you can see from snapshots added in the comments ,  the rows created in 
> the nested loop are saved in a writer buffer.  In this case because the rows 
> were big , the job failed with an Out Of Memory Exception error .
> 2_ The generated WholeStageCodeGen result in a nested loop that for each row  
> , will explode the parent array and then explode the inner array.  The rows 
> are accumulated in the writer buffer without accounting for the row size.
> Please view the attached Spark Gui and Spark Dag 
> In my case the wholestagecodegen includes 2 explode nodes. 
> Because the array elements are large , we end up with an Out Of Memory error. 
>  
> I recommend that we do not merge  multiple explode nodes in the same whole 
> stage code gen node . Doing so leads to potential memory issues.
> In our case , the job execution failed with an  OOM error because the the 
> WSCG executed  into a nested for loop . 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to