This seems to be no other opinions, so I would go on dropping "group by" and "join" for Storm SQL Trident mode until we're ready to handle windowing on SQL semantic.
- Jungtaek Lim (HeartSaVioR) 2016년 10월 22일 (토) 오전 12:23, Jungtaek Lim <[email protected]>님이 작성: > Yes there seems many things which already supports CQL (for example > InfluxDB), and what I mean is that there's no Streaming SQL or CQL > standard, in point of "SQL semantic" view. > > We can define LINQ style "API" and include aggregation and join with > enough discussions (if my understanding is right, that's what Structured > Streaming is, and Flink Table API is also going ahead), but we don't even > have higher-level API and it will be going to be duplicated work (and it > should, if higher-level API is well defined). > > As I linked earlier, Calcite project proposes its streaming sql semantic, > which defines new keywords, and new concepts, and so on.[1 > <https://calcite.apache.org/docs/stream.html>] The thing is it's not > implemented yet, since Calcite is having small community, and most of > contributors are from SQL on Hadoop, not streaming area. Only a small > contributions are done from us and Flink side. > > Storm SQL has some remaining works even we don't address aggregation for > now, so it would be not easy to jump on Calcite side and discuss or > persuade or even help implementing. One of Flink committer initiated > discussion regarding "defining Streaming SQL semantics" [2 > <http://mail-archives.apache.org/mod_mbox/flink-dev/201610.mbox/%3ccaadrtt2t397e_jpnjm6zh-ysn8i0oouno8bnxsotvflmwh5...@mail.gmail.com%3E>] > but it seems that not many devs. are interested. Still seems to be an early > stage for all. > > So working on our own and late participating is also a valid way we can > choose, or participating Flink's discussion now is also a valid way. Storm > SQL has limited contributors so IMHO we need to prioritize and concentrate. > Unless we have more contributors coming in Storm SQL, former way looks more > realistic. > > - Jungtaek Lim (HeartSaVioR) > > [1] https://calcite.apache.org/docs/stream.html > [2] > http://mail-archives.apache.org/mod_mbox/flink-dev/201610.mbox/%3ccaadrtt2t397e_jpnjm6zh-ysn8i0oouno8bnxsotvflmwh5...@mail.gmail.com%3E > > 2016년 10월 21일 (금) 오후 10:50, Bobby Evans <[email protected]>님이 > 작성: > > I am not currently very involved with the storm SQL so take my comments > worth a grain of salt. I am +1 on waiting to define groupby and join until > we have a solid base that we can build this on. > But I do want to contradict a bit about what there being no standard for > streaming SQL. Technically that is true, but in general the streaming SQL > solutions I have seen restrict the supported queries to either have no > aggregations at all (pure element by element operations) or rely on the > OVER clause, and more specifically time based windows. > The issue is that the output of a streaming operation is not a table, it > is a protocol that includes updates. This is where BEAM really shines > because it exposes all of that ugly underbelly and lets users have complete > control over that. The API is rather complex and I think a bit ugly because > of that. I would suggest that we define what we want it to look like, and > let that drive the underlying implementation. If we are feeling ambitions > we can get together with Flink, Spark, and anyone else who might be > interested and see if we can come to an understanding on what extensions to > SQL we should put in to really make streaming work properly. > - Bobby - Bobby > > On Friday, October 21, 2016 2:18 AM, Jungtaek Lim <[email protected]> > wrote: > > > Hi devs, > > Sorry to send multiple mails at once, I had been resolving issues > sequentially, and now stopped a bit and retrospect about the direction of > Storm SQL. > > I'd like to propose destructive actions, dropping features about GROUP BY > and JOIN from Storm SQL which are fortunately not released yet. > > The reason of dropping features is simple: This borrows Trident semantic > (within micro-batch, or stateful), and not making sense of true "streaming" > semantic. > > Spark and Flink interpret "streaming" aggregation and join as windowed > operators. Since there's no SQL standard for streaming (even no de-facto), > they are adding the feature to its API (Structured Streaming for Spark, and > Table API for Flink), and don't address them to SQL side yet. > > I was eager to add more features on Storm SQL to make progress (even Bobby > pointed out similarly), but after worked on these things, I change my mind > that letting users not confusing is more important than adding features. > > Btw, Storm SQL "temporary" relies on Trident since we don't have > higher-level API on core and we don't want to build topology from ground > up. AFAIK, choosing Trident is not for living with micro-batch, and IMHO it > should run on per-tuple streaming manner instead of micro-batch. > Integrating streams API to Storm SQL could be great internal project for > POC of streams API. Exactly-once needs to be addressed before. > > "GROUP BY" is also what SQE supports now (SQE aggregates this stateful and > exactly-once way), so I would like to hear our opinions regarding this. > > Flink and Storm is waiting for Calcite to make progress on Streaming SQL: > https://calcite.apache.org/docs/stream.html (For now most of definitions > are not implemented yet.) > This means that we might not support Streaming SQL semantics in SQL > statement unless Calcite finishes their work. I think this is OK since > there're many other works left on Storm SQL, and Storm SQL is now in > experimental anyway (The state of Spark Structured Streaming and Flink > Streaming SQL are also alpha or experiment.) > > While waiting, we might want to have LINQ style API like Table API and > address aggregate and join from there, but it requires huge amount of works > and it's a kind of duplicated works with streams API (STORM-1961 > <https://issues.apache.org/jira/browse/STORM-1961>) in terms of adding > high-level API. IMHO, if streams API is well defined, it should be fairly > easy and not necessary need to have LINQ style API. (though someone feels > more convenient to use 'select', 'where', and so on.) > > Please share your opinion about this. Especially I'd like to see JW Player > participating discussion, since aggregation is already supported by SQE. > > Thanks for reading a quite long thread. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > > >
