[jira] [Updated] (SPARK-54354) Driver / Executor hangs when there's not enough JVM heap memory for broadcast hashed relation

Hongze Zhang (Jira) Fri, 14 Nov 2025 06:04:09 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-54354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hongze Zhang updated SPARK-54354:
---------------------------------
    Summary: Driver / Executor hangs when there's not enough JVM heap memory 
for broadcast hashed relation  (was: Driver / Executor hangs when there's 
enough JVM heap memory for broadcast hashed relation)

> Driver / Executor hangs when there's not enough JVM heap memory for broadcast 
> hashed relation
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-54354
>                 URL: https://issues.apache.org/jira/browse/SPARK-54354
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.0.1
>            Reporter: Hongze Zhang
>            Priority: Major
>
> This is a bug specifically happening when very large UnsafeHashedRelations 
> are created for broadcast hash join.
>  
> The rationale of the bug explained as following:
>  
> This code: 
> [https://github.com/apache/spark/blob/6cb88c10126bde79076ce5c8d7574cc5c9524746/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala#L142-L150]
>   creates a unified memory manager with Int.Max as the heap memory size for 
> the new hashed relation. This practice essentially creates the condition 
> where the actual JVM memory is significantly smaller than the unified memory 
> manager's heap size. And given we also have the recursive retry policy for 
> this specific case: 
> [https://github.com/apache/spark/blob/6cb88c10126bde79076ce5c8d7574cc5c9524746/core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java#L402,]
>  and unified memory manager's heap size is too large, the program will likely 
> hang for very long time. In a typical local test, it could hang for as long 
> as 2 hours.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-54354) Driver / Executor hangs when there's not enough JVM heap memory for broadcast hashed relation

Reply via email to