Patrick Woody created SPARK-15038:
-------------------------------------
Summary: Add ability to do broadcasts in SQL at execution time
Key: SPARK-15038
URL: https://issues.apache.org/jira/browse/SPARK-15038
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.6.1
Reporter: Patrick Woody
Currently the auto broadcasting done in SparkSQL is asynchronous and done at
query planning time. If you have a large query with many broadcasts, this can
end up creating a large amount of memory pressure/possible OOMs all at once
when it actually isn't necessary.
The current workaround for these types of queries is to disable broadcast
joins, which can be prohibitive performance wise. The proposal for this ticket
is to allow a config point to toggle doing these broadcasts either
eagerly/asynchronously or doing the broadcasts lazily at execution time.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]