Thanks to highlight the parts of types/operators/functions/..., that does make things more complicated. +1 that as a short/middle term solution, the proposal is reasonable. We could follow up in future to handle it in Calcite Babel if possible.
Mingmin On Tue, Aug 6, 2019 at 3:57 PM Rui Wang <ruw...@google.com> wrote: > Hi Mingmin, > > Honestly I don't have an answer to it: a SQL dialect is complicated and I > don't have enough understanding on Calcite (Calcite has a big repo). Based > on my read from CALCITE-2280 > <https://issues.apache.org/jira/browse/CALCITE-2280>, the closer to > standard sql that a dialect is, the less blockers that we will have to > support this dialect in Calcite babel parser. > > However, this is a good question, which raises a good aspect that I found > people usually ignore: supporting a SQL dialect is not only support a type > of syntax. It also includes data types, built-in sql functions, operators > and many other stuff. > > I especially found the following incompatibilities between Calcite and > ZetaSQL during the development: > 1. Calcite does not support Struct/Row type well because Calcite flattens > Rows when reading from tables by adding an extra Projection on top of > tables. > 2. I had trouble in supporting DATETIME(or timestamp without time zone) > type. > 3. Huge incompatibilities on SQL functions. E.g. return type is different > for AVG(long), and many many more. > 4. I am not sure if Calcite has the same set of type casting rules as > BigQuery(my impression is there are differences). > > > I would say in the short/mid term, it's much easier to use logical plan as > IR to implement another SQL dialect for BeamSQL (Linkedin has > similar practice, see their blog post > <https://engineering.linkedin.com/blog/2019/01/bridging-offline-and-nearline-computations-with-apache-calcite> > ). > > For the longer term, it would be interesting to see how we can add > BigQuery syntax (plus its data types and sql functions) to Calcite babel > parser. > > > > -Rui > > > On Tue, Aug 6, 2019 at 2:49 PM Mingmin Xu <mingm...@gmail.com> wrote: > >> Just take a look at https://issues.apache.org/jira/browse/CALCITE-2280 >> which introduced Babel parser in Calcite to support varied dialects, this >> may be an easier way to support BigQuery syntax. @Rui do you notice any big >> difference between Calcite engine and ZetaSQL, like parsing, optimization? >> If that's the case, it make sense to build the alternative switch in Beam >> side. >> >> On Sun, Aug 4, 2019 at 4:47 PM Rui Wang <ruw...@google.com> wrote: >> >>> Mingmin - it sounds like an awesome idea to translate from SparkSQL. >>> It's even more exciting to know if we could translate Spark >>> Structured Streaming code by a similar way, which enables existing Spark >>> SQL/Structure Streaming pipelines run on Beam. >>> >>> Reuven - Thanks for bringing it up. I tried to search dev@calcite and >>> only found[1]. From that thread, I see that adding ZetaSQL to Calcite >>> itself is still a discussion. I am also looking for if anyone knows more >>> progress on this work than the thread. >>> >>> >>> [1]: >>> http://mail-archives.apache.org/mod_mbox/calcite-dev/201905.mbox/%3CCAMj=j=-sPWgxzAgusnx8OYvYDYDcDY=dupe6poytrxhjri9...@mail.gmail.com%3E >>> >>> -Rui >>> >>> On Sun, Aug 4, 2019 at 3:54 PM Reuven Lax <re...@google.com> wrote: >>> >>>> I hear rumours that the Calcite project is planning on adding a >>>> zeta-SQL compatible parser to Calcite itself, in which case there will be a >>>> Java parser we can use as well. Does anyone know if this work is still >>>> going on? >>>> >>>> On Sat, Aug 3, 2019 at 8:41 PM Manu Zhang <owenzhang1...@gmail.com> >>>> wrote: >>>> >>>>> A question to the community, does the size of the change require any >>>>>> process besides the usual PR reviews? >>>>>> >>>>> >>>>> I think so. This is a big change and has come as kind of a surprise >>>>> (sorry if I've missed previous discussions). >>>>> >>>>> Rui, could you explain more on how things will play out between >>>>> BeamSQL and ZetaSQL (A design doc including the pluggable interface would >>>>> be perfect). From GitHub, ZetaSQL is mainly in C++ so what you are doing >>>>> is >>>>> a port or a connector to ZetaSQL ? Do we need to depend on >>>>> https://github.com/google/zetasql ? ZetaSQL looks interesting but I >>>>> could barely find any doc for end users. >>>>> >>>>> Also, I'd prefer the PR to be split into two, one for the pluggable >>>>> interface and one for the ZetaSQL. >>>>> >>>>> Thanks, >>>>> Manu >>>>> >>>>> >>>>> >>>>> On Sat, Aug 3, 2019 at 10:06 AM Ahmet Altay <al...@google.com> wrote: >>>>> >>>>>> Thank you Rui for the heads up. >>>>>> >>>>>> A question to the community, does the size of the change require any >>>>>> process besides the usual PR reviews? >>>>>> >>>>>> On Fri, Aug 2, 2019 at 10:23 AM Rui Wang <ruw...@google.com> wrote: >>>>>> >>>>>>> Hi community, >>>>>>> >>>>>>> I have been working on supporting ZetaSQL[1] as a SQL dialect in >>>>>>> BeamSQL. ZetaSQL is a SQL analyzer open sourced by Google. Here is >>>>>>> ZetaSQL's documentation[2]. >>>>>>> >>>>>>> Birfely, the design of integrating ZetaSQL with BeamSQL is, I made a >>>>>>> plugable query planner interface in BeamSQL, and we can easily plug in a >>>>>>> new planner[3] (in my case, ZetaSQL planner). Actually anyone can add >>>>>>> new >>>>>>> planners by this way (e.g. PostgreSQL dialect). >>>>>>> >>>>>>> I want to contribute ZetaSQL planner and its related code(~10k) to >>>>>>> Beam repo(#9210 <https://github.com/apache/beam/pull/9210>). This >>>>>>> contribution barely touch existing Beam code (because the idea is >>>>>>> plugable >>>>>>> planner). >>>>>>> >>>>>>> >>>>>>> *Acknowledgement* >>>>>>> Thanks to all the people who provided help during Beam ZetaSQL >>>>>>> development: Matthew Brown, Brian Hulette, Andrew Pilloud, Kenneth >>>>>>> Knowles, >>>>>>> Anton Kedin and Mikhail Gryzykhin. This list is not exhausted and also >>>>>>> thanks to contributions which are not listed. >>>>>>> >>>>>>> >>>>>>> [1]: https://github.com/google/zetasql >>>>>>> [2]: https://github.com/google/zetasql/tree/master/docs >>>>>>> [3]: >>>>>>> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/QueryPlanner.java >>>>>>> >>>>>>> >>>>>>> -Rui >>>>>>> >>>>>> >> >> -- >> ---- >> Mingmin >> > -- ---- Mingmin