[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Franck Tago updated SPARK-44759: -------------------------------- Component/s: Deploy Spark Core > Do not combine multiple Generate operators in the same WholeStageCodeGen node > because it can easily cause OOM failures if arrays are relatively large > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Deploy, Optimizer, Spark Core > Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1 > Reporter: Franck Tago > Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > This is an issue since the WSCG implementation of the generate node. > Because WSCG compute rows in batches , the combination of WSCG and the > explode operation consume a lot of the dedicated executor memory. This is > even more true when the WSCG node contains multiple explode nodes. This is > the case when flattening a nested array. > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from snapshots added in the comments , the rows created in > the nested loop are saved in a writer buffer. In this case because the rows > were big , the job failed with an Out Of Memory Exception error . > 2_ The generated WholeStageCodeGen result in a nested loop that for each row > , will explode the parent array and then explode the inner array. The rows > are accumulated in the writer buffer without accounting for the row size. > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whole > stage code gen node . Doing so leads to potential memory issues. > In our case , the job execution failed with an OOM error because the the > WSCG executed into a nested for loop . > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org