Re: new Catalyst/SQL component merged into master

Michael Armbrust Fri, 21 Mar 2014 11:11:28 -0700

>
> It will be great if there are any examples or usecases to look at ?
>
There are examples in the Spark documentation.  Patrick posted and updated
copy here so people can see them before 1.0 is released:
http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html


> Does this feature has different usecases than shark or more cleaner as
> hive dependency is gone?
>
Depending on how you use this, there is still a dependency on Hive (By
default this is not the case.  See the above documentation for more
details).  However, the dependency is on a stock version of Hive instead of
one modified by the AMPLab.  Furthermore, Spark SQL has its own optimizer,
instead of relying on the Hive optimizer.  Long term, this is going to give
us a lot more flexibility to optimize queries specifically for the Spark
execution engine.  We are actively porting over the best parts of shark
(specifically the in-memory columnar representation).

Shark still has some features that are missing in Spark SQL, including
SharkServer (and years of testing).  Once SparkSQL graduates from Alpha
status, it'll likely become the new backend for Shark.

Re: new Catalyst/SQL component merged into master

Reply via email to