[jira] [Created] (SPARK-26305) Breakthrough the memory limitation of broadcast join

Lantao Jin (JIRA) Fri, 07 Dec 2018 05:24:01 -0800

Lantao Jin created SPARK-26305:
----------------------------------

             Summary: Breakthrough the memory limitation of broadcast join
                 Key: SPARK-26305
                 URL: https://issues.apache.org/jira/browse/SPARK-26305
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.0
            Reporter: Lantao Jin



If the join between a big table and a small one faces data skewing issue, we 
usually use a broadcast hint in SQL to resolve it. However, current broadcast 
join has many limitations. The primary restriction is memory. The small table 
which is broadcasted must be fulfilled to memory in driver/executors side. 
Although it will spill to disk when the memory is insufficient, it still causes 
OOM if the small table actually is not absolutely small, it's relatively small. 
In our company, we have many real big data SQL analysis jobs which handle 
dozens of hundreds terabytes join and shuffle. For example, the size of large 
table is 100TB, and the small one is 10000 times less, still 10GB. In this 
case, broadcast join couldn't be finished since the small one is still larger 
than expected. If the join is data skewing, the sortmerge join always failed.

Hive has a skew join hint which could trigger two-stage task to handle the skew 
key and normal key separately. I guess Databricks Runtime has the similar 
implementation. However, the skew join hint needs SQL users know the data in 
table like their children. They must know which key is skewing in a join. It's 
very hard to know since the data is changing day by day and the join key isn't 
fixed in different queries. The users have to set a huge partition number to 
try their luck.

So, do we have a simple, rude and efficient way to resolve it? Back to the 
limitation, if the broadcasted table no needs to fill to memory, in other 
words, driver/executor stores the broadcasted table to disk only. The problem 
mentioned above could be resolved.

I will offer a design doc if you have same feeling about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26305) Breakthrough the memory limitation of broadcast join

Reply via email to