[
https://issues.apache.org/jira/browse/SPARK-18944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved SPARK-18944.
-------------------------------
Resolution: Invalid
Questions belong on [email protected]
> Understanding BroadcastNestedLoopJoin and number of partitions
> --------------------------------------------------------------
>
> Key: SPARK-18944
> URL: https://issues.apache.org/jira/browse/SPARK-18944
> Project: Spark
> Issue Type: Question
> Components: SQL
> Affects Versions: 1.6.2, 2.0.2
> Environment: Spark 1.6.2
> Reporter: David Hodeffi
> Priority: Trivial
> Labels: question
>
> I have two dataframes which I am joining. small and big size dataframess.
> The optimizer suggest to use BroadcastNestedLoopJoin.
> number of partitions for the big dataframe is 200 while small dataframe has 5
> partitions.
> The joined dataframe results with 205 partitions
> (joined.rdd.partitions.size), I have tried to understand why is this number
> and figured out that BroadCastNestedLoopJoin is actually a union.
> code :
> case class BroadcastNestedLoopJoin{
> def doExecuteo(): ={
> ...
> ...
> sparkContext.union(
> matchedStreamRows,
> sparkContext.makeRDD(notMatchedBroadcastRows)
> )
> }
> }
> can someone explain what exactly the code of doExecute() do? can you
> elaborate about all the null checks and why can we have nulls ? Why do we
> have 205 partions? link to a JIRA with discussion that can explain the code
> can help.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]