Re: SQL language vs DataFrame API

Xiao Li Wed, 09 Dec 2015 11:43:06 -0800

That sounds great! When it is decided, please let us know and we can add
more features and make it ANSI SQL compliant.


Thank you!

Xiao Li


2015-12-09 11:31 GMT-08:00 Michael Armbrust <[email protected]>:

> I don't plan to abandon HiveQL compatibility, but I'd like to see us move
> towards something with more SQL compliance (perhaps just newer versions of
> the HiveQL parser).  Exactly which parser will do that for us is under
> investigation.
>
> On Wed, Dec 9, 2015 at 11:02 AM, Xiao Li <[email protected]> wrote:
>
>> Hi, Michael,
>>
>> Does that mean SqlContext will be built on HiveQL in the near future?
>>
>> Thanks,
>>
>> Xiao Li
>>
>>
>> 2015-12-09 10:36 GMT-08:00 Michael Armbrust <[email protected]>:
>>
>>> I think that it is generally good to have parity when the functionality
>>> is useful.  However, in some cases various features are there just to
>>> maintain compatibility with other system.  For example CACHE TABLE is eager
>>> because Shark's cache table was.  df.cache() is lazy because Spark's cache
>>> is.  Does that mean that we need to add some eager caching mechanism to
>>> dataframes to have parity?  Probably not, users can just call .count() if
>>> they want to force materialization.
>>>
>>> Regarding the differences between HiveQL and the SQLParser, I think we
>>> should get rid of the SQL parser.  Its kind of a hack that I built just so
>>> that there was some SQL story for people who didn't compile with Hive.
>>> Moving forward, I'd like to see the distinction between the HiveContext and
>>> SQLContext removed and we can standardize on a single parser.  For this
>>> reason I'd be opposed to spending a lot of dev/reviewer time on adding
>>> features there.
>>>
>>> On Wed, Dec 9, 2015 at 8:34 AM, Cristian O <
>>> [email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> I was wondering what the "official" view is on feature parity between
>>>> SQL and DF apis. Docs are pretty sparse on the SQL front, and it seems that
>>>> some features are only supported at various times in only one of Spark SQL
>>>> dialect, HiveQL dialect and DF API. DF.cube(), DISTRIBUTE BY, CACHE LAZY
>>>> are some examples
>>>>
>>>> Is there an explicit goal of having consistent support for all features
>>>> in both DF and SQL ?
>>>>
>>>> Thanks,
>>>> Cristian
>>>>
>>>
>>>
>>
>

Re: SQL language vs DataFrame API

Reply via email to