Hi, Michael, Does that mean SqlContext will be built on HiveQL in the near future?
Thanks, Xiao Li 2015-12-09 10:36 GMT-08:00 Michael Armbrust <mich...@databricks.com>: > I think that it is generally good to have parity when the functionality is > useful. However, in some cases various features are there just to maintain > compatibility with other system. For example CACHE TABLE is eager because > Shark's cache table was. df.cache() is lazy because Spark's cache is. > Does that mean that we need to add some eager caching mechanism to > dataframes to have parity? Probably not, users can just call .count() if > they want to force materialization. > > Regarding the differences between HiveQL and the SQLParser, I think we > should get rid of the SQL parser. Its kind of a hack that I built just so > that there was some SQL story for people who didn't compile with Hive. > Moving forward, I'd like to see the distinction between the HiveContext and > SQLContext removed and we can standardize on a single parser. For this > reason I'd be opposed to spending a lot of dev/reviewer time on adding > features there. > > On Wed, Dec 9, 2015 at 8:34 AM, Cristian O < > cristian.b.op...@googlemail.com> wrote: > >> Hi, >> >> I was wondering what the "official" view is on feature parity between SQL >> and DF apis. Docs are pretty sparse on the SQL front, and it seems that >> some features are only supported at various times in only one of Spark SQL >> dialect, HiveQL dialect and DF API. DF.cube(), DISTRIBUTE BY, CACHE LAZY >> are some examples >> >> Is there an explicit goal of having consistent support for all features >> in both DF and SQL ? >> >> Thanks, >> Cristian >> > >