Re: Join implementation in SparkSQL

Andrew Ash Thu, 15 Jan 2015 12:53:59 -0800

What Reynold is describing is a performance optimization in implementation,
but the semantics of the join (cartesian product plus relational algebra
filter) should be the same and produce the same results.


On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin <r...@databricks.com> wrote:

> It's a bunch of strategies defined here:
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
>
> In most common use cases (e.g. inner equi join), filters are pushed below
> the join or into the join. Doing a cartesian product followed by a filter
> is too expensive.
>
>
> On Thu, Jan 15, 2015 at 7:39 AM, Alessandro Baretta <alexbare...@gmail.com
> >
> wrote:
>
> > Hello,
> >
> > Where can I find docs about how joins are implemented in SparkSQL? In
> > particular, I'd like to know whether they are implemented according to
> > their relational algebra definition as filters on top of a cartesian
> > product.
> >
> > Thanks,
> >
> > Alex
> >
>

Re: Join implementation in SparkSQL

Reply via email to