Not so sure about your question, but the SparkStrategies.scala and Optimizer.scala is a good start if you want to get details of the join implementation or optimization.
-----Original Message----- From: Andrew Ash [mailto:[email protected]] Sent: Friday, January 16, 2015 4:52 AM To: Reynold Xin Cc: Alessandro Baretta; [email protected] Subject: Re: Join implementation in SparkSQL What Reynold is describing is a performance optimization in implementation, but the semantics of the join (cartesian product plus relational algebra filter) should be the same and produce the same results. On Thu, Jan 15, 2015 at 1:36 PM, Reynold Xin <[email protected]> wrote: > It's a bunch of strategies defined here: > > https://github.com/apache/spark/blob/master/sql/core/src/main/scala/or > g/apache/spark/sql/execution/SparkStrategies.scala > > In most common use cases (e.g. inner equi join), filters are pushed > below the join or into the join. Doing a cartesian product followed by > a filter is too expensive. > > > On Thu, Jan 15, 2015 at 7:39 AM, Alessandro Baretta > <[email protected] > > > wrote: > > > Hello, > > > > Where can I find docs about how joins are implemented in SparkSQL? > > In particular, I'd like to know whether they are implemented > > according to their relational algebra definition as filters on top > > of a cartesian product. > > > > Thanks, > > > > Alex > > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
