Re: SQL language vs DataFrame API

Xiao Li Wed, 09 Dec 2015 11:02:59 -0800

Hi, Michael,

Does that mean SqlContext will be built on HiveQL in the near future?


Thanks,

Xiao Li


2015-12-09 10:36 GMT-08:00 Michael Armbrust <mich...@databricks.com>:

> I think that it is generally good to have parity when the functionality is
> useful.  However, in some cases various features are there just to maintain
> compatibility with other system.  For example CACHE TABLE is eager because
> Shark's cache table was.  df.cache() is lazy because Spark's cache is.
> Does that mean that we need to add some eager caching mechanism to
> dataframes to have parity?  Probably not, users can just call .count() if
> they want to force materialization.
>
> Regarding the differences between HiveQL and the SQLParser, I think we
> should get rid of the SQL parser.  Its kind of a hack that I built just so
> that there was some SQL story for people who didn't compile with Hive.
> Moving forward, I'd like to see the distinction between the HiveContext and
> SQLContext removed and we can standardize on a single parser.  For this
> reason I'd be opposed to spending a lot of dev/reviewer time on adding
> features there.
>
> On Wed, Dec 9, 2015 at 8:34 AM, Cristian O <
> cristian.b.op...@googlemail.com> wrote:
>
>> Hi,
>>
>> I was wondering what the "official" view is on feature parity between SQL
>> and DF apis. Docs are pretty sparse on the SQL front, and it seems that
>> some features are only supported at various times in only one of Spark SQL
>> dialect, HiveQL dialect and DF API. DF.cube(), DISTRIBUTE BY, CACHE LAZY
>> are some examples
>>
>> Is there an explicit goal of having consistent support for all features
>> in both DF and SQL ?
>>
>> Thanks,
>> Cristian
>>
>
>

Re: SQL language vs DataFrame API

Reply via email to