[
https://issues.apache.org/jira/browse/SPARK-12358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon updated SPARK-12358:
---------------------------------
Labels: bulk-closed (was: )
> Spark SQL query with lots of small tables under broadcast threshold leading
> to java.lang.OutOfMemoryError
> ---------------------------------------------------------------------------------------------------------
>
> Key: SPARK-12358
> URL: https://issues.apache.org/jira/browse/SPARK-12358
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.5.2
> Reporter: Deenar Toraskar
> Priority: Major
> Labels: bulk-closed
>
> Hi
> I have a Spark SQL query with a lot of small tables (5x plus) all below the
> broadcast threshold. Looking at the query plan Spark is broadcasting all
> these tables together without checking if there is sufficient memory
> available. This leads to
> Exception in thread "broadcast-hash-join-2" java.lang.OutOfMemoryError: Java
> heap space
> errors and causes the executors to die and query fail.
> I got around this issue by reducing the spark.sql.autoBroadcastJoinThreshold
> to stop broadcasting the bigger tables in the query.
> A fix would be to
> a) ensure that in addition to the per table threshold
> (spark.sql.autoBroadcastJoinThreshold), there is a total broadcast (say
> spark.sql.autoBroadcastJoinThresholdCumulative ) threshold per query, so only
> data up to that limit is broadcast preventing executors running out of memory.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]