[ 
https://issues.apache.org/jira/browse/SPARK-55353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huan Zheng updated SPARK-55353:
-------------------------------
    Summary:  Driver OOM regression with complex SQL queries: Add config to 
disable SQLAppStatusListener  (was:  Driver OOM with complex SQL queries: Add 
config to disable SQLAppStatusListener)

>  Driver OOM regression with complex SQL queries: Add config to disable 
> SQLAppStatusListener
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-55353
>                 URL: https://issues.apache.org/jira/browse/SPARK-55353
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL, UI
>    Affects Versions: 3.5.7
>            Reporter: Huan Zheng
>            Priority: Major
>         Attachments: desensitized_sql_example.txt, 
> 企业微信截图_6a9bc73d-dbd5-4013-b517-5c074114b511.png, 
> 企业微信截图_742d6985-ae85-4f8d-b33b-d9445a65ba7a.png
>
>
> PROBLEM:
> Severe driver OOM regression from Spark 3.2.2 to 3.5.7 for complex SQL 
> queries.
>  - Spark 3.2.2: Same query runs with 2GB driver memory
>  - Spark 3.5.7(with extra patch for SPARK-45439): Same query OOMs with 8GB 
> driver memory
>  - Memory regression: 4x increase in driver memory requirements
> Query characteristics:
>  - ~200 stages
>  - Complex feature engineering with multiple UDFs
>  - 12 LEFT JOINs with subqueries
>  - 100+ output columns
> ================================================================================
> ROOT CAUSE:
> 2G Driver Heap dump analysis shows 15.6M AccumulatorMetadata objects 
> consuming ~1.5GB memory, held by SQLAppStatusListener.
> Accumulator count per task increased 4x due to new metrics added between 
> versions:
>  - SPARK-36620 (3.3.0): Push-based shuffle metrics
>  - SPARK-40711 (3.4.0): Window spill metrics
>  - SPARK-43214 (3.5.0): Driver-side metrics
> SQLAppStatusListener collects all these metrics in driver memory, causing OOM 
> for queries with many stages and tasks.
> ================================================================================
> SOLUTION:
> Add new static configuration: spark.sql.ui.appStatusListener.enabled Controls 
> whether SQLAppStatusListener should be loaded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to