I made a comment, but it might be better to state it here. Since you are starting from scratch, could you maybe add to wiki your requirements/design thoughts, so that we could understand the intent and perhaps help?
I am new to Apache contributions, so I might not know that PRs are the preferred way, but I personally get a better understanding from a little bit of a big picture write up, even if it is bunch of bullet points, and references. On Friday, February 16, 2018, Julian Hyde <[email protected]> wrote: > I have kicked off development with the first PR to the Stream-SQL-TCK > repository, and I have logged an issue for what I intend to work on next. > > Please review the PR[1] and comment on the issues [2], and “watch” the > GitHub repo so that you are notified of new issues and PRs. > > If you disagree with the approach, feel free to say so. The PR is > something of a straw man. I’d rather have negative reaction than no > reaction. > > Julian > > [1] https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/pulls < > https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/pulls> > > [2] https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/issues < > https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/issues> > > > > On Feb 9, 2018, at 11:44 PM, Julian Hyde <[email protected]> wrote: > > > > As you know, I am a big believer that SQL is a great language not just > > for data at rest, but also data in flight. Calcite has extensions to > > SQL for streaming queries, and a reference implementation, and I have > > spoken about streaming SQL at several conferences over the years. > > Several projects, including Apex, Beam, Flink and Storm, have > > leveraged Calcite to add streaming SQL support. > > > > But SQL becomes truly valuable when people can assume that its > > features exist in every product in the market. It makes their > > applications portable, and it makes it easier for them to apply their > > skills to new products. So, it is important that streaming SQL becomes > > standard. > > > > The official SQL standard is written by ANSI/ISO and is dominated by > > large vendors, and I don't even know how to engage with them. But the > > interesting work on streaming systems is happening in Apache, so it > > makes sense to start closer to home. After conversations with folks > > from a few projects - some of those mentioned above, plus Kafka and > > Spark - a group of us have concluded that the next step is to develop > > a standard using the Apache way - by open discussion, making decisions > > by consensus, by iteratively developing and reviewing code, and by > > releasing that code periodically. > > > > How can you develop a standard by writing software? The idea is to > > develop a Test Compatibility Kit (TCK), a suite of tests that embodies > > the standard. If you are the author of a streaming engine, you can > > download the TCK and run it against your engine, and the test tells > > you whether you engine is compliant. > > > > The TCK is developed by committers from the participating engines. If > > we want to add a new feature to streaming SQL, say stream-to-stream > > joins, then we would add tests to the TCK, and achieve consensus about > > the SQL syntax and the expected behavior - which rows will be emitted, > > at what times, and in what order, for given inputs to a query. > > > > Our plan is to use this list - dev@calcite - for discussions, and use > > a github project (under Apache license but outside the ASF) for code > > and issues. > > > > Kenn Knowles has already created the project: > > https://github.com/Stream-SQL-TCK/Stream-SQL-TCK > > > > Next steps are to design a language for the tests, figure out which > > features we would like to test in our first release, and start writing > > the first few tests. > > > > Here are the basic features we might test in the first release: > > * SELECT ... FROM > > * WHERE > > * GROUP BY with Hop and Tumble windowing functions > > * UNION ALL > > * Query a table (no streams involved) > > * JOIN a stream to a stream > > * JOIN a stream to a static table > > > > Here are more advanced features we might test in later releases: > > * GROUP BY with Session windowing function > > * MATCH_RECOGNIZE > > * Arbitrary stateful processing > > * Injected UDFs > > * Windowed aggregate functions (OVER) > > * JOIN a stream to time-varying table > > * Mechanism to emit early results (EMIT) > > > > All of the above are subject to discussion & change. > > > > Here is my sketch of a test: > > > > test "filter-equals" { > > decls { > > CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product); > > } > > queries { > > Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’ > > } > > input { > > Orders (‘00:01’, 10, ‘beer’) > > Orders (‘00:03’, 11, ‘soda’) > > } > > output { > > Q1 (‘00:03’, 11, ‘soda’) > > } > > } > > > > Again, subject to change. Especially, don't worry too much about the > > syntax; that will certainly change. But it shows what pieces of > > information are necessary to define a test without making any > > reference to the engine that will execute that test. > > > > If you're interested in participating in this project, you are most > > welcome. Please raise your hand by joining the discussion on this > > list. Also, start logging cases in the github project, and start > > writing pull requests. > > > > Julian > >
