Re: "Standardizing" streaming SQL

Edmon Begoli Fri, 16 Feb 2018 15:06:06 -0800

> If you can make committees aware of this effort, I would be very grateful.


Julian — I can and will.

I have officially joined the INCITS 32 and 32.2 standards group, and I
should be able to serve in a role of a Calcite/Open Source SQL advocate to
the this standards group, and to inform our community on what is being
proposed.

Edmon

On Fri, Feb 16, 2018 at 17:59 Julian Hyde <[email protected]> wrote:

> Edmon,
>
> If you can make committees aware of this effort, I would be very grateful.
> It’s very clear to me that people are using streaming SQL in the real
> world, and there is a need to standardize. Since we’re focusing on a TCK
> rather than specifications our efforts could complement those of the
> standards committees.
>
> Julian
>
>
> > On Feb 9, 2018, at 11:53 PM, Edmon Begoli <[email protected]> wrote:
> >
> > Julian,
> >
> > I am certainly interested in participating in the discussion, and in the
> > initiative -- time permits.
> > In my environment, streaming data from large environmental sensor
> networks
> > is a common challenge.
> >
> > Riccardo Tomassini and I just this week discussed the research interests
> > and work in streams reasoning.
> >
> > In terms of standards and influencing those - I am participating in some
> > data standards committees, so this participation might be a way for us
> > (Calcite and related) to have a voice in terms of contributions or
> > influences of the streaming standards.
> >
> > You are right that it is mostly big vendors, but I think there is a room
> > for us to have a say.
> >
> > Thank you for the initiative,
> > Edmon
> >
> > On Sat, Feb 10, 2018 at 2:44 AM, Julian Hyde <[email protected]> wrote:
> >
> >> As you know, I am a big believer that SQL is a great language not just
> >> for data at rest, but also data in flight. Calcite has extensions to
> >> SQL for streaming queries, and a reference implementation, and I have
> >> spoken about streaming SQL at several conferences over the years.
> >> Several projects, including Apex, Beam, Flink and Storm, have
> >> leveraged Calcite to add streaming SQL support.
> >>
> >> But SQL becomes truly valuable when people can assume that its
> >> features exist in every product in the market. It makes their
> >> applications portable, and it makes it easier for them to apply their
> >> skills to new products. So, it is important that streaming SQL becomes
> >> standard.
> >>
> >> The official SQL standard is written by ANSI/ISO and is dominated by
> >> large vendors, and I don't even know how to engage with them. But the
> >> interesting work on streaming systems is happening in Apache, so it
> >> makes sense to start closer to home. After conversations with folks
> >> from a few projects - some of those mentioned above, plus Kafka and
> >> Spark - a group of us have concluded that the next step is to develop
> >> a standard using the Apache way - by open discussion, making decisions
> >> by consensus, by iteratively developing and reviewing code, and by
> >> releasing that code periodically.
> >>
> >> How can you develop a standard by writing software? The idea is to
> >> develop a Test Compatibility Kit (TCK), a suite of tests that embodies
> >> the standard. If you are the author of a streaming engine, you can
> >> download the TCK and run it against your engine, and the test tells
> >> you whether you engine is compliant.
> >>
> >> The TCK is developed by committers from the participating engines. If
> >> we want to add a new feature to streaming SQL, say stream-to-stream
> >> joins, then we would add tests to the TCK, and achieve consensus about
> >> the SQL syntax and the expected behavior - which rows will be emitted,
> >> at what times, and in what order, for given inputs to a query.
> >>
> >> Our plan is to use this list - dev@calcite - for discussions, and use
> >> a github project (under Apache license but outside the ASF) for code
> >> and issues.
> >>
> >> Kenn Knowles has already created the project:
> >> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK
> >>
> >> Next steps are to design a language for the tests, figure out which
> >> features we would like to test in our first release, and start writing
> >> the first few tests.
> >>
> >> Here are the basic features we might test in the first release:
> >> * SELECT ... FROM
> >> * WHERE
> >> * GROUP BY with Hop and Tumble windowing functions
> >> * UNION ALL
> >> * Query a table (no streams involved)
> >> * JOIN a stream to a stream
> >> * JOIN a stream to a static table
> >>
> >> Here are more advanced features we might test in later releases:
> >> * GROUP BY with Session windowing function
> >> * MATCH_RECOGNIZE
> >> * Arbitrary stateful processing
> >> * Injected UDFs
> >> * Windowed aggregate functions (OVER)
> >> * JOIN a stream to time-varying table
> >> * Mechanism to emit early results (EMIT)
> >>
> >> All of the above are subject to discussion & change.
> >>
> >> Here is my sketch of a test:
> >>
> >> test "filter-equals" {
> >>  decls {
> >>    CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product);
> >>  }
> >>  queries {
> >>    Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’
> >>  }
> >>  input {
> >>    Orders (‘00:01’, 10, ‘beer’)
> >>    Orders (‘00:03’, 11, ‘soda’)
> >>  }
> >>  output {
> >>    Q1 (‘00:03’, 11, ‘soda’)
> >>  }
> >> }
> >>
> >> Again, subject to change. Especially, don't worry too much about the
> >> syntax; that will certainly change. But it shows what pieces of
> >> information are necessary to define a test without making any
> >> reference to the engine that will execute that test.
> >>
> >> If you're interested in participating in this project, you are most
> >> welcome. Please raise your hand by joining the discussion on this
> >> list. Also, start logging cases in the github project, and start
> >> writing pull requests.
> >>
> >> Julian
> >>
>
>

Re: "Standardizing" streaming SQL

Reply via email to