> If you can make committees aware of this effort, I would be very grateful.
Julian — I can and will. I have officially joined the INCITS 32 and 32.2 standards group, and I should be able to serve in a role of a Calcite/Open Source SQL advocate to the this standards group, and to inform our community on what is being proposed. Edmon On Fri, Feb 16, 2018 at 17:59 Julian Hyde <[email protected]> wrote: > Edmon, > > If you can make committees aware of this effort, I would be very grateful. > It’s very clear to me that people are using streaming SQL in the real > world, and there is a need to standardize. Since we’re focusing on a TCK > rather than specifications our efforts could complement those of the > standards committees. > > Julian > > > > On Feb 9, 2018, at 11:53 PM, Edmon Begoli <[email protected]> wrote: > > > > Julian, > > > > I am certainly interested in participating in the discussion, and in the > > initiative -- time permits. > > In my environment, streaming data from large environmental sensor > networks > > is a common challenge. > > > > Riccardo Tomassini and I just this week discussed the research interests > > and work in streams reasoning. > > > > In terms of standards and influencing those - I am participating in some > > data standards committees, so this participation might be a way for us > > (Calcite and related) to have a voice in terms of contributions or > > influences of the streaming standards. > > > > You are right that it is mostly big vendors, but I think there is a room > > for us to have a say. > > > > Thank you for the initiative, > > Edmon > > > > On Sat, Feb 10, 2018 at 2:44 AM, Julian Hyde <[email protected]> wrote: > > > >> As you know, I am a big believer that SQL is a great language not just > >> for data at rest, but also data in flight. Calcite has extensions to > >> SQL for streaming queries, and a reference implementation, and I have > >> spoken about streaming SQL at several conferences over the years. > >> Several projects, including Apex, Beam, Flink and Storm, have > >> leveraged Calcite to add streaming SQL support. > >> > >> But SQL becomes truly valuable when people can assume that its > >> features exist in every product in the market. It makes their > >> applications portable, and it makes it easier for them to apply their > >> skills to new products. So, it is important that streaming SQL becomes > >> standard. > >> > >> The official SQL standard is written by ANSI/ISO and is dominated by > >> large vendors, and I don't even know how to engage with them. But the > >> interesting work on streaming systems is happening in Apache, so it > >> makes sense to start closer to home. After conversations with folks > >> from a few projects - some of those mentioned above, plus Kafka and > >> Spark - a group of us have concluded that the next step is to develop > >> a standard using the Apache way - by open discussion, making decisions > >> by consensus, by iteratively developing and reviewing code, and by > >> releasing that code periodically. > >> > >> How can you develop a standard by writing software? The idea is to > >> develop a Test Compatibility Kit (TCK), a suite of tests that embodies > >> the standard. If you are the author of a streaming engine, you can > >> download the TCK and run it against your engine, and the test tells > >> you whether you engine is compliant. > >> > >> The TCK is developed by committers from the participating engines. If > >> we want to add a new feature to streaming SQL, say stream-to-stream > >> joins, then we would add tests to the TCK, and achieve consensus about > >> the SQL syntax and the expected behavior - which rows will be emitted, > >> at what times, and in what order, for given inputs to a query. > >> > >> Our plan is to use this list - dev@calcite - for discussions, and use > >> a github project (under Apache license but outside the ASF) for code > >> and issues. > >> > >> Kenn Knowles has already created the project: > >> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK > >> > >> Next steps are to design a language for the tests, figure out which > >> features we would like to test in our first release, and start writing > >> the first few tests. > >> > >> Here are the basic features we might test in the first release: > >> * SELECT ... FROM > >> * WHERE > >> * GROUP BY with Hop and Tumble windowing functions > >> * UNION ALL > >> * Query a table (no streams involved) > >> * JOIN a stream to a stream > >> * JOIN a stream to a static table > >> > >> Here are more advanced features we might test in later releases: > >> * GROUP BY with Session windowing function > >> * MATCH_RECOGNIZE > >> * Arbitrary stateful processing > >> * Injected UDFs > >> * Windowed aggregate functions (OVER) > >> * JOIN a stream to time-varying table > >> * Mechanism to emit early results (EMIT) > >> > >> All of the above are subject to discussion & change. > >> > >> Here is my sketch of a test: > >> > >> test "filter-equals" { > >> decls { > >> CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product); > >> } > >> queries { > >> Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’ > >> } > >> input { > >> Orders (‘00:01’, 10, ‘beer’) > >> Orders (‘00:03’, 11, ‘soda’) > >> } > >> output { > >> Q1 (‘00:03’, 11, ‘soda’) > >> } > >> } > >> > >> Again, subject to change. Especially, don't worry too much about the > >> syntax; that will certainly change. But it shows what pieces of > >> information are necessary to define a test without making any > >> reference to the engine that will execute that test. > >> > >> If you're interested in participating in this project, you are most > >> welcome. Please raise your hand by joining the discussion on this > >> list. Also, start logging cases in the github project, and start > >> writing pull requests. > >> > >> Julian > >> > >
