Hi Jark,
sorry that I didn't wrote back earlier. I wanted to talk to Fabian first
about this. In general, according to Calcite's plans, even SQL queries
containing the "STREAM" keyword are regular standard SQL. In theory we
could omit the "STREAM" keyword as long as it is guaranteed that the
generated logical plans look the same. So I'm not against having the
same grammar for both batch and streaming queries. However, I think we
should contribute code to Calcite if the logical representation is not
there already for operators we need. We need to research how far the
Calcite development is. We can implement windows via
user-defined-function as it also done in Calcite streaming design document.
It would be very interesting for the upcoming design phase if you could
show us how you implemented your Blink SQL. For instance, how do you
define windows there?
Regards,
Timo
Am 18/08/16 um 16:34 schrieb Aljoscha Krettek:
Hi,
I personally would like it a lot if the SQL queries for batch and
stream programs looked the same. With the decision to move the Table
API on top of Calcite and also use the Calcite SQL parser Flink is
somewhat tied to Calcite so I don't know whether we can add our own
window constructs and teach the parser to properly read them.
Maybe Fabian and Timo have more insights here since they worked on the
move to Calcite.
Cheers,
Aljoscha
+Timo looping him in directly
On Tue, 16 Aug 2016 at 09:29 Jark Wu <wuchong...@alibaba-inc.com
<mailto:wuchong...@alibaba-inc.com>> wrote:
Hi,
Currently, Flink use Calcite for SQL parsing. So we use the
StreamSQL grammer proposed by Calcite[1] which we have to use the
`STREAM` keyword in SQL. For example, `SELECT *
FROM Orders` is a regular standard SQL and will be translated to a
batch job. If you want to statement a stream job, you have add the
`STREAM` keyword, `SELECT STREAM *
FROM Orders`.
I'm thinking of why do we distinguish between StreamSQL and
BatchSQL grammer? We already have separate high-level API for
batch(DataSet) and stream(DataStream). And we have a unified Table
API for batch and stream (that's great!). Why do we have to
separate them again in SQL?
I hope we can manipulate stream data like a table. Such as `SELECT *
FROM Orders`, if Orders is a table (or run in batch execution
env), then it's a batch job. If Orders is a stream (or run in
stream execution env), then it's a stream job. The grammer of
StreamSQL and BatchSQL is totally the same. And that is what we
did in Blink SQL.
The benefits if we unify the grammar :
1. Easy to use StreamSQL for anyone who knows regular SQL. There
is no difference between StreamSQL and regular SQL.
2. Not blocked by Calcite. Currently, Calcite StreamSQL is not
fullly supported. Not support stream-to-stream JOIN, not support
window aggregate, not support aggregate without window, etc. We
may need to wait for calcite to support them before we start work.
As they are supported by regular SQL besides window. We can
implement window via user-defined-function. So if we can use
regular SQL instead of StreamSQL, we can start to work it right
now and not wait for Calcite.
3. Blink SQL can merge back to community to accelerate Flink SQL
evolving. Blink SQL has done most work of it. We implement
UDF/UDTF/UDAF, aggregate with/without window, and stream-to-stream
JOIN, and so on.
4. Window also can work in batch job.
Just my thoughts :)
What do you think about this ?
[1] https://calcite.apache.org/docs/stream.html
- Jark Wu
--
Freundliche Grüße / Kind Regards
Timo Walther
Follow me: @twalthr
https://www.linkedin.com/in/twalthr