Re: new Catalyst/SQL component merged into master

Usman Ghani Mon, 24 Mar 2014 11:11:28 -0700

How does it compare against Shark, and what is the future of Shark with
this new module in place?



On Sun, Mar 23, 2014 at 11:49 PM, Evan Chan <[email protected]> wrote:

> Hi Michael,
>
> Congrats, this is really neat!
>
> What thoughts do you have regarding adding indexing support and
> predicate pushdown to this SQL framework?    Right now we have custom
> bitmap indexing to speed up queries, so we're really curious as far as
> the architectural direction.
>
> -Evan
>
>
> On Fri, Mar 21, 2014 at 11:09 AM, Michael Armbrust
> <[email protected]> wrote:
> >>
> >> It will be great if there are any examples or usecases to look at ?
> >>
> > There are examples in the Spark documentation.  Patrick posted and
> updated
> > copy here so people can see them before 1.0 is released:
> >
> http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html
> >
> >> Does this feature has different usecases than shark or more cleaner as
> >> hive dependency is gone?
> >>
> > Depending on how you use this, there is still a dependency on Hive (By
> > default this is not the case.  See the above documentation for more
> > details).  However, the dependency is on a stock version of Hive instead
> of
> > one modified by the AMPLab.  Furthermore, Spark SQL has its own
> optimizer,
> > instead of relying on the Hive optimizer.  Long term, this is going to
> give
> > us a lot more flexibility to optimize queries specifically for the Spark
> > execution engine.  We are actively porting over the best parts of shark
> > (specifically the in-memory columnar representation).
> >
> > Shark still has some features that are missing in Spark SQL, including
> > SharkServer (and years of testing).  Once SparkSQL graduates from Alpha
> > status, it'll likely become the new backend for Shark.
>
>
>
> --
> --
> Evan Chan
> Staff Engineer
> [email protected]  |
>

Re: new Catalyst/SQL component merged into master

Reply via email to