I’m not personally a fan of wikis. (If your goal is a web site that can be edited by users, then markdown + PRs works pretty well in the Apache world. And you’re less likely to end up with an unstructured mess.)
But I do strongly believe in sketching out a specification (and if not obvious, a design) before starting work. It allows for feedback before the person doing the work is too invested. The Stream-SQL-TCK project is not under Apache so it would be GitHub issues rather than Apache JIRA cases, but it would work the same. > On Feb 17, 2018, at 10:18 AM, Edmon Begoli <[email protected]> wrote: > > I made a comment, but it might be better to state it here. > > Since you are starting from scratch, could you maybe add to wiki your > requirements/design thoughts, so that we could understand the intent and > perhaps help? > > I am new to Apache contributions, so I might not know that PRs are the > preferred way, but I personally get a better understanding from a little > bit of a big picture write up, even if it is bunch of bullet points, and > references. > > On Friday, February 16, 2018, Julian Hyde <[email protected]> wrote: > >> I have kicked off development with the first PR to the Stream-SQL-TCK >> repository, and I have logged an issue for what I intend to work on next. >> >> Please review the PR[1] and comment on the issues [2], and “watch” the >> GitHub repo so that you are notified of new issues and PRs. >> >> If you disagree with the approach, feel free to say so. The PR is >> something of a straw man. I’d rather have negative reaction than no >> reaction. >> >> Julian >> >> [1] https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/pulls < >> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/pulls> >> >> [2] https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/issues < >> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK/issues> >> >> >>> On Feb 9, 2018, at 11:44 PM, Julian Hyde <[email protected]> wrote: >>> >>> As you know, I am a big believer that SQL is a great language not just >>> for data at rest, but also data in flight. Calcite has extensions to >>> SQL for streaming queries, and a reference implementation, and I have >>> spoken about streaming SQL at several conferences over the years. >>> Several projects, including Apex, Beam, Flink and Storm, have >>> leveraged Calcite to add streaming SQL support. >>> >>> But SQL becomes truly valuable when people can assume that its >>> features exist in every product in the market. It makes their >>> applications portable, and it makes it easier for them to apply their >>> skills to new products. So, it is important that streaming SQL becomes >>> standard. >>> >>> The official SQL standard is written by ANSI/ISO and is dominated by >>> large vendors, and I don't even know how to engage with them. But the >>> interesting work on streaming systems is happening in Apache, so it >>> makes sense to start closer to home. After conversations with folks >>> from a few projects - some of those mentioned above, plus Kafka and >>> Spark - a group of us have concluded that the next step is to develop >>> a standard using the Apache way - by open discussion, making decisions >>> by consensus, by iteratively developing and reviewing code, and by >>> releasing that code periodically. >>> >>> How can you develop a standard by writing software? The idea is to >>> develop a Test Compatibility Kit (TCK), a suite of tests that embodies >>> the standard. If you are the author of a streaming engine, you can >>> download the TCK and run it against your engine, and the test tells >>> you whether you engine is compliant. >>> >>> The TCK is developed by committers from the participating engines. If >>> we want to add a new feature to streaming SQL, say stream-to-stream >>> joins, then we would add tests to the TCK, and achieve consensus about >>> the SQL syntax and the expected behavior - which rows will be emitted, >>> at what times, and in what order, for given inputs to a query. >>> >>> Our plan is to use this list - dev@calcite - for discussions, and use >>> a github project (under Apache license but outside the ASF) for code >>> and issues. >>> >>> Kenn Knowles has already created the project: >>> https://github.com/Stream-SQL-TCK/Stream-SQL-TCK >>> >>> Next steps are to design a language for the tests, figure out which >>> features we would like to test in our first release, and start writing >>> the first few tests. >>> >>> Here are the basic features we might test in the first release: >>> * SELECT ... FROM >>> * WHERE >>> * GROUP BY with Hop and Tumble windowing functions >>> * UNION ALL >>> * Query a table (no streams involved) >>> * JOIN a stream to a stream >>> * JOIN a stream to a static table >>> >>> Here are more advanced features we might test in later releases: >>> * GROUP BY with Session windowing function >>> * MATCH_RECOGNIZE >>> * Arbitrary stateful processing >>> * Injected UDFs >>> * Windowed aggregate functions (OVER) >>> * JOIN a stream to time-varying table >>> * Mechanism to emit early results (EMIT) >>> >>> All of the above are subject to discussion & change. >>> >>> Here is my sketch of a test: >>> >>> test "filter-equals" { >>> decls { >>> CREATE Orders (TIMESTAMP rowtime, INT orderId, VARCHAR product); >>> } >>> queries { >>> Q1: SELECT STREAM * FROM Orders WHERE product = ‘soda’ >>> } >>> input { >>> Orders (‘00:01’, 10, ‘beer’) >>> Orders (‘00:03’, 11, ‘soda’) >>> } >>> output { >>> Q1 (‘00:03’, 11, ‘soda’) >>> } >>> } >>> >>> Again, subject to change. Especially, don't worry too much about the >>> syntax; that will certainly change. But it shows what pieces of >>> information are necessary to define a test without making any >>> reference to the engine that will execute that test. >>> >>> If you're interested in participating in this project, you are most >>> welcome. Please raise your hand by joining the discussion on this >>> list. Also, start logging cases in the github project, and start >>> writing pull requests. >>> >>> Julian >> >>
