Re: using the Hive SQL parser in Spark

Gopal Vijayaraghavan Fri, 18 Dec 2015 12:36:40 -0800

>We have looked into various options, and it looks like the best option is
>to copy the ANTLR grammar file from Hive into Spark. Because the grammar
>file is tightly coupled with Hive's semantic analysis, we need to refactor
>some code to use them so it will end up becoming the .g file plus some
>coupled code.


Is the eventual goal to contribute that fork back into Hive & have Hive
devs maintain a compatible parser for SparkSQL?

Would that affect Hive's ability to refactor the SQL parser in the future
or is this a one-time only deal?

>parser. From Hive's perspective this does not provide any immediate
>benefits. From Spark's perspective, we iterate very quickly so having to
>depend on an external component also slow down our development. We also
>have some requirements that simply don't apply in other projects (e.g.
>being able to parse DataFrame expressions).

>From that I assume, this involves some form of cut-paste duplication of
the code into SparkSQL project with that version diverging away from
Hive's.

> Thanks a lot for developing this parser, and we will try our best to
> contribute back as we fix bugs. I will also make sure we have the proper
> acknowledgment when we do this.


Under the Apache license, there's no actual restriction against a hostile
embrace-extend by copying hive's code verbatim as long as the fork retains
license notices.

The maintainability concerns are mostly around whether this is intended as
an ongoing relationship, including any compatibility committments from
hive-dev@.


Cheers,
Gopal

Re: using the Hive SQL parser in Spark

Reply via email to