[ 
https://issues.apache.org/jira/browse/SPARK-55353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huan Zheng updated SPARK-55353:
-------------------------------
    Attachment: 企业微信截图_742d6985-ae85-4f8d-b33b-d9445a65ba7a.png

>  Driver OOM with complex SQL queries: Add config to disable 
> SQLAppStatusListener
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-55353
>                 URL: https://issues.apache.org/jira/browse/SPARK-55353
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL, UI
>    Affects Versions: 3.5.7
>            Reporter: Huan Zheng
>            Priority: Minor
>         Attachments: desensitized_sql_example.txt, 
> 企业微信截图_742d6985-ae85-4f8d-b33b-d9445a65ba7a.png
>
>
> PROBLEM:
> Severe driver OOM regression from Spark 3.2.2 to 3.5.7 for complex SQL 
> queries.
> - Spark 3.2.2: Same query runs with 2GB driver memory
> - Spark 3.5.7: Same query OOMs with 8GB driver memory
> - Memory regression: 4x increase in driver memory requirements
> Query characteristics:
> - ~200 stages, ~1000 tasks per stage
> - Complex feature engineering with multiple UDFs
> - 12 LEFT JOINs with subqueries
> - 100+ output columns
> ================================================================================
> ROOT CAUSE:
> 2G Driver Heap dump analysis shows 15.6M AccumulatorMetadata objects 
> consuming ~1.5GB memory,
> held by SQLAppStatusListener.
> Memory calculation:
> - Spark 3.2.2: 200 stages × 1000 tasks × 25 accumulators/task = 5M objects 
> (~500MB)
> - Spark 3.5.7: 155 stages × 1000 tasks × 100 accumulators/task = 15.5M 
> objects (~1.5GB)
> Accumulator count per task increased 4x due to new metrics added between 
> versions:
> - SPARK-36620 (3.3.0): Push-based shuffle metrics
> - SPARK-40711 (3.4.0): Window spill metrics
> - SPARK-43214 (3.5.0): Driver-side metrics
> SQLAppStatusListener collects all these metrics in driver memory, causing OOM 
> for
> queries with many stages and tasks.
> ================================================================================
> SOLUTION:
> Add new static configuration: spark.sql.ui.appStatusListener.enabled Controls 
> whether SQLAppStatusListener is loaded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to