Julian, I am certainly interested in participating in the discussion, and in the initiative -- time permits. In my environment, streaming data from large environmental sensor networks is a common challenge.
Riccardo Tomassini and I just this week discussed the research interests and work in streams reasoning. In terms of standards and influencing those - I am participating in some data standards committees, so this participation might be a way for us (Calcite and related) to have a voice in terms of contributions or influences of the streaming standards. You are right that it is mostly big vendors, but I think there is a room for us to have a say. Thank you for the initiative, Edmon On Sat, Feb 10, 2018 at 2:44 AM, Julian Hyde <jh...@apache.org> wrote: > As you know, I am a big believer that SQL is a great language not just > for data at rest, but also data in flight. Calcite has extensions to > SQL for streaming queries, and a reference implementation, and I have > spoken about streaming SQL at several conferences over the years. > Several projects, including Apex, Beam, Flink and Storm, have > leveraged Calcite to add streaming SQL support. > > But SQL becomes truly valuable when people can assume that its > features exist in every product in the market. It makes their > applications portable, and it makes it easier for them to apply their > skills to new products. So, it is important that streaming SQL becomes > standard. > > The official SQL standard is written by ANSI/ISO and is dominated by > large vendors, and I don't even know how to engage with them. But the > interesting work on streaming systems is happening in Apache, so it > makes sense to start closer to home. After conversations with folks > from a few projects - some of those mentioned above, plus Kafka and > Spark - a group of us have concluded that the next step is to develop > a standard using the Apache way - by open discussion, making decisions > by consensus, by iteratively developing and reviewing code, and by > releasing that code periodically. > > How can you develop a standard by writing software? The idea is to > develop a Test Compatibility Kit (TCK), a suite of tests that embodies > the standard. If you are the author of a streaming engine, you can > download the TCK and run it against your engine, and the test tells > you whether you engine is compliant. > > The TCK is developed by committers from the participating engines. If > we want to add a new feature to streaming SQL, say stream-to-stream > joins, then we would add tests to the TCK, and achieve consensus about > the SQL syntax and the expected behavior - which rows will be emitted, > at what times, and in what order, for given inputs to a query. > > Our plan is to use this list - dev@calcite - for discussions, and use > a github project (under Apache license but outside the ASF) for code > and issues. > > Kenn Knowles has already created the project: > https://github.com/Stream-SQL-TCK/Stream-SQL-TCK > > Next steps are to design a language for the tests, figure out which > features we would like to test in our first release, and start writing > the first few tests. > > Here are the basic features we might test in the first release: > * SELECT ... FROM > * WHERE > * GROUP BY with Hop and Tumble windowing functions > * UNION ALL > * Query a table (no streams involved) > * JOIN a stream to a stream > * JOIN a stream to a static table > > Here are more advanced features we might test in later releases: > * GROUP BY with Session windowing function > * MATCH_RECOGNIZE > * Arbitrary stateful processing > * Injected UDFs > * Windowed aggregate functions (OVER) > * JOIN a stream to time-varying table > * Mechanism to emit early results (EMIT) > > All of the above are subject to discussion & change. > > Here is my sketch of a test: > > test "filter-equals" { > decls { > CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product); > } > queries { > Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’ > } > input { > Orders (‘00:01’, 10, ‘beer’) > Orders (‘00:03’, 11, ‘soda’) > } > output { > Q1 (‘00:03’, 11, ‘soda’) > } > } > > Again, subject to change. Especially, don't worry too much about the > syntax; that will certainly change. But it shows what pieces of > information are necessary to define a test without making any > reference to the engine that will execute that test. > > If you're interested in participating in this project, you are most > welcome. Please raise your hand by joining the discussion on this > list. Also, start logging cases in the github project, and start > writing pull requests. > > Julian >