Re: "Standardizing" streaming SQL

Edmon Begoli Sat, 17 Feb 2018 10:19:28 -0800

I made a comment, but it might be better to state it here.

Since you are starting from scratch, could you maybe add to wiki your
requirements/design thoughts, so that we could understand the intent and
perhaps help?


I am new to Apache contributions, so I might not know that PRs are the
preferred way, but I personally get a better understanding from a little
bit of a big picture write up, even if it is bunch of bullet points, and
references.

On Friday, February 16, 2018, Julian Hyde <[email protected]> wrote:

> I have kicked off development with the first PR to the Stream-SQL-TCK
> repository, and I have logged an issue for what I intend to work on next.
>
> Please review the PR[1] and comment on the issues [2], and “watch” the
> GitHub repo so that you are notified of new issues and PRs.
>
> If you disagree with the approach, feel free to say so. The PR is
> something of a straw man. I’d rather have negative reaction than no
> reaction.
>
> Julian
>
> [1] https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/pulls <
> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/pulls>
>
> [2] https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/issues <
> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/issues>
>
>
> > On Feb 9, 2018, at 11:44 PM, Julian Hyde <[email protected]> wrote:
> >
> > As you know, I am a big believer that SQL is a great language not just
> > for data at rest, but also data in flight. Calcite has extensions to
> > SQL for streaming queries, and a reference implementation, and I have
> > spoken about streaming SQL at several conferences over the years.
> > Several projects, including Apex, Beam, Flink and Storm, have
> > leveraged Calcite to add streaming SQL support.
> >
> > But SQL becomes truly valuable when people can assume that its
> > features exist in every product in the market. It makes their
> > applications portable, and it makes it easier for them to apply their
> > skills to new products. So, it is important that streaming SQL becomes
> > standard.
> >
> > The official SQL standard is written by ANSI/ISO and is dominated by
> > large vendors, and I don't even know how to engage with them. But the
> > interesting work on streaming systems is happening in Apache, so it
> > makes sense to start closer to home. After conversations with folks
> > from a few projects - some of those mentioned above, plus Kafka and
> > Spark - a group of us have concluded that the next step is to develop
> > a standard using the Apache way - by open discussion, making decisions
> > by consensus, by iteratively developing and reviewing code, and by
> > releasing that code periodically.
> >
> > How can you develop a standard by writing software? The idea is to
> > develop a Test Compatibility Kit (TCK), a suite of tests that embodies
> > the standard. If you are the author of a streaming engine, you can
> > download the TCK and run it against your engine, and the test tells
> > you whether you engine is compliant.
> >
> > The TCK is developed by committers from the participating engines. If
> > we want to add a new feature to streaming SQL, say stream-to-stream
> > joins, then we would add tests to the TCK, and achieve consensus about
> > the SQL syntax and the expected behavior - which rows will be emitted,
> > at what times, and in what order, for given inputs to a query.
> >
> > Our plan is to use this list - dev@calcite - for discussions, and use
> > a github project (under Apache license but outside the ASF) for code
> > and issues.
> >
> > Kenn Knowles has already created the project:
> > https://github.com/Stream-SQL-TCK/Stream-SQL-TCK
> >
> > Next steps are to design a language for the tests, figure out which
> > features we would like to test in our first release, and start writing
> > the first few tests.
> >
> > Here are the basic features we might test in the first release:
> > * SELECT ... FROM
> > * WHERE
> > * GROUP BY with Hop and Tumble windowing functions
> > * UNION ALL
> > * Query a table (no streams involved)
> > * JOIN a stream to a stream
> > * JOIN a stream to a static table
> >
> > Here are more advanced features we might test in later releases:
> > * GROUP BY with Session windowing function
> > * MATCH_RECOGNIZE
> > * Arbitrary stateful processing
> > * Injected UDFs
> > * Windowed aggregate functions (OVER)
> > * JOIN a stream to time-varying table
> > * Mechanism to emit early results (EMIT)
> >
> > All of the above are subject to discussion & change.
> >
> > Here is my sketch of a test:
> >
> > test "filter-equals" {
> >  decls {
> >    CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product);
> >  }
> >  queries {
> >    Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’
> >  }
> >  input {
> >    Orders (‘00:01’, 10, ‘beer’)
> >    Orders (‘00:03’, 11, ‘soda’)
> >  }
> >  output {
> >    Q1 (‘00:03’, 11, ‘soda’)
> >  }
> > }
> >
> > Again, subject to change. Especially, don't worry too much about the
> > syntax; that will certainly change. But it shows what pieces of
> > information are necessary to define a test without making any
> > reference to the engine that will execute that test.
> >
> > If you're interested in participating in this project, you are most
> > welcome. Please raise your hand by joining the discussion on this
> > list. Also, start logging cases in the github project, and start
> > writing pull requests.
> >
> > Julian
>
>

Re: "Standardizing" streaming SQL

Reply via email to