Re: About Stream SQL

Julian Hyde Thu, 04 Feb 2016 00:37:00 -0800

I totally agree with you. (Sorry for the delayed response; this week has been 
very busy.)

There is a tendency of vendors (and projects) to think that their technology is 
unique, and superior to everyone else’s, and want to showcase it in their 
dialect of SQL. That is natural, and it’s OK, since it makes them strive to 
make their technology better.

However, they have to remember that the end users don’t want something unique, 
they want something that solves their problem. They would like something that 
is standards compliant so that it is easy to learn, easy to hire developers 
for, and — if the worst comes to the worst — easy to migrate to a compatible 
competing technology.

I know the developers at Storm and Flink (and Samza too) and they understand 
the importance of collaborating on a standard.

I have been trying to play a dual role: supplying the parser and planner for 
streaming SQL, and also to facilitate the creation of a standard language and 
semantics of streaming SQL. For the latter, see Streaming page on Calcite’s web 
site[1]. On that page, I intend to illustrate all of the main patterns of 
streaming queries, give them names (e.g. “Tumbling windows”), and show how 
those translate into streaming SQL.

Also, it would be useful to create a reference implementation of streaming SQL 
in Calcite so that you can validate and run queries. The performance, 
scalability and reliability will not be the same as if you ran Storm, Flink or 
Samza, but at least you can see what the semantics should be.

I believe that most, if not all, of the examples that the projects are coming 
up with can be translated into SQL. It will be challenging, because we want to 
preserve the semantics of SQL, allow streaming SQL to interoperate with 
traditional relations, and also retain the general look and feel of SQL. (For 
example, I fought quite hard[2] recently for the principle that GROUP BY 
defines a partition (in the set-theory sense)[3] and therefore could not be 
used to represent a tumbling window, until I remembered that GROUPING SETS 
already allows each input row to appear in more than one output sub-total.)

What can you, the users, do? Get involved in the discussion about what you want 
in the language. Encourage the projects to bring their proposed SQL features 
into this forum for discussion, and add to the list of patterns and examples on 
the Streaming page. As in any standards process, the users help to keep the 
vendors focused.

I’ll be talking about streaming SQL, planning, and standardization at the Samza 
meetup in 2 weeks[4], so if any of you are in the Bay Area, please stop by.

Julian

[1] http://calcite.apache.org/docs/stream.html

[2] 
http://mail-archives.apache.org/mod_mbox/calcite-dev/201506.mbox/%3CCAPSgeETbowxM2TRX0RFxQ_tEAPk2uM=he0arywinbtovgwb...@mail.gmail.com%3E

[3] https://en.wikipedia.org/wiki/Partition_of_a_set

[4] http://www.meetup.com/Bay-Area-Samza-Meetup/events/228430492/

> On Jan 29, 2016, at 10:29 PM, Wanglan (Lan) <[email protected]> wrote:
> 
> Hi to all,
> 
> I am from Huawei and am focusing on data stream processing.
> Recently I noticed that both in Storm community and Flink community there are 
> endeavors to user Calcite as SQL parser to enable Storm/Flink to support SQL. 
> They both want to supplemented or clarify Streaming SQL of calcite, 
> especially the definition of windows.
> I am considering if both communities working on designing Stream SQL syntax 
> separately, there would come out two different syntaxes which represent the 
> same use case.
> Therefore, I am wondering if it is possible to unify such work, i.e. design 
> and compliment the calcite Streaming SQL to enrich window definition so that 
> both storm and flink can reuse the calcite(Streaming SQL) as their SQL parser 
> for streaming cases with little change.
> What do you think about this idea?
>

Re: About Stream SQL

Reply via email to