Streaming SQL is an area of growing interest and several projects are moving forward supporting more functionality like windowing and also improving the tooling around it. Much of that work is driven or inspired by Apache Calcite. The arrival of a new effort to define a TCK is an exciting and logical next step.
If someone here is interested to evolve SQL in Apex and beyond, consider joining the ongoing discussions and efforts in Calcite. Thanks, Thomas ---------- Forwarded message ---------- From: Julian Hyde <jh...@apache.org> Date: Fri, Feb 9, 2018 at 11:44 PM Subject: "Standardizing" streaming SQL To: dev <d...@calcite.apache.org> As you know, I am a big believer that SQL is a great language not just for data at rest, but also data in flight. Calcite has extensions to SQL for streaming queries, and a reference implementation, and I have spoken about streaming SQL at several conferences over the years. Several projects, including Apex, Beam, Flink and Storm, have leveraged Calcite to add streaming SQL support. But SQL becomes truly valuable when people can assume that its features exist in every product in the market. It makes their applications portable, and it makes it easier for them to apply their skills to new products. So, it is important that streaming SQL becomes standard. The official SQL standard is written by ANSI/ISO and is dominated by large vendors, and I don't even know how to engage with them. But the interesting work on streaming systems is happening in Apache, so it makes sense to start closer to home. After conversations with folks from a few projects - some of those mentioned above, plus Kafka and Spark - a group of us have concluded that the next step is to develop a standard using the Apache way - by open discussion, making decisions by consensus, by iteratively developing and reviewing code, and by releasing that code periodically. How can you develop a standard by writing software? The idea is to develop a Test Compatibility Kit (TCK), a suite of tests that embodies the standard. If you are the author of a streaming engine, you can download the TCK and run it against your engine, and the test tells you whether you engine is compliant. The TCK is developed by committers from the participating engines. If we want to add a new feature to streaming SQL, say stream-to-stream joins, then we would add tests to the TCK, and achieve consensus about the SQL syntax and the expected behavior - which rows will be emitted, at what times, and in what order, for given inputs to a query. Our plan is to use this list - dev@calcite - for discussions, and use a github project (under Apache license but outside the ASF) for code and issues. Kenn Knowles has already created the project: https://github.com/Stream-SQL-TCK/Stream-SQL-TCK Next steps are to design a language for the tests, figure out which features we would like to test in our first release, and start writing the first few tests. Here are the basic features we might test in the first release: * SELECT ... FROM * WHERE * GROUP BY with Hop and Tumble windowing functions * UNION ALL * Query a table (no streams involved) * JOIN a stream to a stream * JOIN a stream to a static table Here are more advanced features we might test in later releases: * GROUP BY with Session windowing function * MATCH_RECOGNIZE * Arbitrary stateful processing * Injected UDFs * Windowed aggregate functions (OVER) * JOIN a stream to time-varying table * Mechanism to emit early results (EMIT) All of the above are subject to discussion & change. Here is my sketch of a test: test "filter-equals" { decls { CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product); } queries { Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’ } input { Orders (‘00:01’, 10, ‘beer’) Orders (‘00:03’, 11, ‘soda’) } output { Q1 (‘00:03’, 11, ‘soda’) } } Again, subject to change. Especially, don't worry too much about the syntax; that will certainly change. But it shows what pieces of information are necessary to define a test without making any reference to the engine that will execute that test. If you're interested in participating in this project, you are most welcome. Please raise your hand by joining the discussion on this list. Also, start logging cases in the github project, and start writing pull requests. Julian