[ 
https://issues.apache.org/jira/browse/SPARK-18944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18944.
-------------------------------
    Resolution: Invalid

Questions belong on [email protected]

> Understanding BroadcastNestedLoopJoin and number of partitions
> --------------------------------------------------------------
>
>                 Key: SPARK-18944
>                 URL: https://issues.apache.org/jira/browse/SPARK-18944
>             Project: Spark
>          Issue Type: Question
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.2
>         Environment: Spark 1.6.2
>            Reporter: David Hodeffi
>            Priority: Trivial
>              Labels: question
>
> I have two dataframes which I am joining. small and big  size dataframess. 
> The optimizer  suggest to use BroadcastNestedLoopJoin. 
> number of partitions for the big dataframe is 200 while small dataframe has 5 
> partitions.
> The joined dataframe  results with 205 partitions 
> (joined.rdd.partitions.size), I have tried to understand  why is this number 
> and figured out that BroadCastNestedLoopJoin is actually a union. 
> code : 
> case class BroadcastNestedLoopJoin{
>    def doExecuteo(): ={
>         ...
>         ...
>        sparkContext.union(
>             matchedStreamRows,
>             sparkContext.makeRDD(notMatchedBroadcastRows)
>       )
>   }
> }
> can someone explain what exactly the code of doExecute() do?  can you 
> elaborate about all the null checks and why can we have nulls ? Why do we 
> have 205 partions? link to a JIRA with discussion that can explain the code 
> can help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to