RE: Join implementation in SparkSQL

Cheng, Hao Thu, 15 Jan 2015 17:25:44 -0800

Not so sure about your question, but the SparkStrategies.scala and 
Optimizer.scala is a good start if you want to get details of the join 
implementation or optimization.


-----Original Message-----
From: Andrew Ash [mailto:[email protected]] 
Sent: Friday, January 16, 2015 4:52 AM
To: Reynold Xin
Cc: Alessandro Baretta; [email protected]
Subject: Re: Join implementation in SparkSQL

What Reynold is describing is a performance optimization in implementation, but 
the semantics of the join (cartesian product plus relational algebra
filter) should be the same and produce the same results.

On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin <[email protected]> wrote:

> It's a bunch of strategies defined here:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/or
> g/apache/spark/sql/execution/SparkStrategies.scala
>
> In most common use cases (e.g. inner equi join), filters are pushed 
> below the join or into the join. Doing a cartesian product followed by 
> a filter is too expensive.
>
>
> On Thu, Jan 15, 2015 at 7:39 AM, Alessandro Baretta 
> <[email protected]
> >
> wrote:
>
> > Hello,
> >
> > Where can I find docs about how joins are implemented in SparkSQL? 
> > In particular, I'd like to know whether they are implemented 
> > according to their relational algebra definition as filters on top 
> > of a cartesian product.
> >
> > Thanks,
> >
> > Alex
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Join implementation in SparkSQL

Reply via email to