Re: [DISCUSS] SQL Syntax for Table API StatementSet

Fabian Hueske Mon, 22 Jun 2020 08:24:23 -0700

Thanks for the discussion Godfrey and Timo,

I like the syntax proposed by Jark and Timo:


BEGIN STATEMENT SET;
   INSERT INTO ...;
   INSERT INTO ...;
END;

(I didn't pay attention and didn't mean to propose START over BEGIN. I just
wanted to make the point that the syntax should make it clear that a
statement set is started).

I think the important questions about streaming/batch queries and
sync/async execution need to be discussed and solved.
However, I think these points are orthogonal to the question about
supporting statement sets.
These issues exist today (without a SQL syntax for statement sets) and IMO
such a syntax doesn't make the situation any worse or better (assuming that
we agree on the limitation that all statements in a set are either
streaming or batch queries).
As I said before, from Flink's point of view a statement set can be
replaced by a single INSERT INTO query (either streaming or batch,
depending on the type of queries in the set).

Best, Fabian


Am Mo., 22. Juni 2020 um 10:55 Uhr schrieb Timo Walther <[email protected]
>:

> Hi Godfrey,
>
> 1) Of course we should have unified behavior for API and SQL file.
> However, this doesn't mean that `executeSql` needs to become blocking or
> support multi-statements. In a programmatic API, async is more useful as
> a user can control long running jobs (regardless of batch or streaming).
> Sync behavior can be expressed on an async API (e.g.
> TableResult.await()). If we support multi-statements in the API, it will
> not be supported through `executeSql`, this part of the API has been
> finalized in the last release. We need to come up with a new API method.
>
> 3) I think forcing async execution also for multiline batch queries in
> SQL can be future work. Either we enable those using a flag or special
> syntax in a SQL file. Or do we want this flecibility already in the
> first multi-statement support version?
>
> Regards,
> Timo
>
> On 17.06.20 15:27, godfrey he wrote:
> > Hi Fabian, Jack, Timo
> >
> > Thanks for the suggestions.
> >
> > Regarding the SQL syntax, BEGIN is more popular than START. I'm fine with
> > the syntax Timo suggested.
> >
> > Regarding whether this should be implemented in Flink's SQL core. I think
> > there are three things to consider:
> >
> > First one, do we need to unify the default behavior of API and sql file?
> > The execution of `TableEnvironment#executeSql` method and
> > `StatementSet#execute` method is asynchronous
> > for both batch and streaming, which means these methods just submit the
> job
> > and then return a `TableResult`.
> >   While for batch processing (e.g. hive, traditional databases), the
> default
> > behavior is sync mode.
> > So this behavior is different from the APIs. I think it's better we can
> > unify the default behavior.
> >
> > Second one, how to determine the execution behavior of each statement in
> a
> > file which contains both
> > batch sql and streaming sql. Currently, we have a flag to tell the
> planner
> > that the TableEnvironment is
> > batch env or stream env which can determine the default behavior. We want
> > to remove
> > the flag and unify the TableEnvironment in the future. Then
> > TableEnvironment can execute both
> > batch sql and streaming sql. Timo and I have a discussion about this on
> > slack: for DML & DQL,
> > if a statement has keywords like `EMIT STREAM`, it's streaming sql and
> will
> > be executed in async mode.
> > otherwise it's a batch sql and will be executed in sync mode.
> >
> > Three one, how to flexibly support execution mode switching for batch
> sql.
> > For streaming sql, all DMLs & DQLs should be in async mode because the
> job
> > may be never finished.
> > While for batch sql, I think both modes are needed. I know some platforms
> > execute batch sql
> > in async mode, and then continuously monitor the job status. Do we need
> > introduce `set execute-mode=xx` command
> >   or new sql syntax like `START SYNC EXECUTION` ?
> >
> > For sql-client or other projects, we can easily decide what behavior an
> app
> > can support.
> > Just as Jark said, many downstream projects have the same requirement for
> > multiple statement support,
> > but they may have different execution behaviors. It's great if flink can
> > support flexible execution modes.
> > Or Flink core just defines the syntax, provides parser and supports a
> > default execution mode.
> > The downstream projects can use the APIs and parsed results to decide how
> > to execute a sql.
> >
> > Best,
> > Godfrey
> >
> > Timo Walther <[email protected]> 于2020年6月17日周三 下午6:32写道：
> >
> >> Hi Fabian,
> >>
> >> thanks for the proposal. I agree that we should have consensus on the
> >> SQL syntax as well and thus finalize the concepts introduced in FLIP-84.
> >>
> >> I would favor Jark's proposal. I would like to propose the following
> >> syntax:
> >>
> >> BEGIN STATEMENT SET;
> >>     INSERT INTO ...;
> >>     INSERT INTO ...;
> >> END;
> >>
> >> 1) BEGIN and END are commonly used for blocks in SQL.
> >>
> >> 2) We should not start mixing START/BEGIN for different kind of blocks.
> >> Because that can also be confusing for users. There is no additional
> >> helpful semantic in using START over BEGIN.
> >>
> >> 3) Instead, we should rather parameterize the block statament with
> >> `STATEMENT SET` and keep the END of the block simple (also similar to
> >> CASE ... WHEN ... END).
> >>
> >> 4) If we look at Jark's example in SQL Server, the BEGIN is also
> >> parameterized by `BEGIN { TRAN | TRANSACTION }`.
> >>
> >> 5) Also in Java curly braces are used for both classes, methods, and
> >> loops for different purposes parameterized by the preceding code.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 17.06.20 11:36, Fabian Hueske wrote:
> >>> Thanks for joining this discussion Jark!
> >>>
> >>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and
> >> BEGIN /
> >>> END.
> >>>
> >>> The only commonality is that all three group multiple statements.
> >>> * BEGIN TRANSACTION / COMMIT creates a transactional context that
> >>> guarantees atomicity, consistency, and isolation. Statements and
> queries
> >>> are sequentially executed.
> >>> * BEGIN / END defines a block of statements just like curly braces ({
> and
> >>> }) do in Java. The statements (which can also include variable
> >> definitions
> >>> and printing) are sequentially executed.
> >>> * A statement set defines a group of statements that are optimized
> >> together
> >>> and jointly executed at the same time, i.e., there is no sequence or
> >> order.
> >>>
> >>> A statement set (consisting of multiple INSERT INTO statements) behaves
> >>> just like a single INSERT INTO statement.
> >>> Everywhere where an INSERT INTO statement can be executed, it should be
> >>> possible to execute a statement set consisting of multiple INSERT INTO
> >>> statements.
> >>> That's also why I think that statement sets are orthogonal to
> >>> multi-statement execution.
> >>>
> >>> As I said before, I'm happy to discuss syntax proposals for statement
> >> sets.
> >>> However, I think a BEGIN / END syntax for statement sets would confuse
> >>> users who know this syntax from MySQL, SQL Server, or another DBMS.
> >>>
> >>> Thanks,
> >>> Fabian
> >>>
> >>>
> >>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <[email protected]>:
> >>>
> >>>> Hi Fabian,
> >>>>
> >>>> Thanks for starting this discussion. I think this is a very important
> >>>> syntax to support file mode and multi-statement for SQL Client.
> >>>> I'm +1 to introduce a syntax to group SQL statements to execute
> >> together.
> >>>>
> >>>> As a reference, traditional database systems also have similar syntax,
> >> such
> >>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a
> >>>> transaction [1],
> >>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that
> >>>> execute together.
> >>>>
> >>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler?
> >>>>
> >>>> Regarding where to implement, I also prefer to have it in Flink SQL
> >> core,
> >>>> here are some reasons from my side:
> >>>> 1) I think many downstream projects (e.g Zeppelin) will have the same
> >>>> requirement. It would be better to have it in core instead of
> >> reinventing
> >>>> the wheel by users.
> >>>> 2) Having it in SQL CLI means it is a standard syntax to support
> >> statement
> >>>> set in Flink. So I think it makes sense to have it in core too,
> >> otherwise,
> >>>> it looks like a broken feature.
> >>>>       In 1.10, CREATE VIEW is only supported in SQL CLI, not
> supported in
> >>>> TableEnvironment, which confuses many users.
> >>>> 3) Currently, we are moving statement parsing to use sql-parser
> >>>> (FLINK-17728). Calcite has a good support for parsing
> multi-statements.
> >>>>       It will be tricky to parse multi-statements only in SQL Client.
> >>>>
> >>>> Best,
> >>>> Jark
> >>>>
> >>>> [1]:
> >>>>
> >>>>
> >>
> https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15
> >>>> [2]:
> >>>>
> >>>>
> >>
> https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/
> >>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html
> >>>>
> >>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <[email protected]>
> wrote:
> >>>>
> >>>>> Hi everyone,
> >>>>>
> >>>>> FLIP-84 [1] added the concept of a "statement set" to group multiple
> >>>> INSERT
> >>>>> INTO statements (SQL or Table API) together. The statements in a
> >>>> statement
> >>>>> set are jointly optimized and executed as a single Flink job.
> >>>>>
> >>>>> I would like to start a discussion about a SQL syntax to group
> multiple
> >>>>> INSERT INTO statements in a statement set. The use case would be to
> >>>> expose
> >>>>> the statement set feature to a solely text based client for Flink SQL
> >>>> such
> >>>>> as Flink's SQL CLI [1].
> >>>>>
> >>>>> During the discussion of FLIP-84, we had briefly talked about such a
> >>>> syntax
> >>>>> [3].
> >>>>>
> >>>>> START STATEMENT SET;
> >>>>> INSERT INTO ... SELECT ...;
> >>>>> INSERT INTO ... SELECT ...;
> >>>>> ...
> >>>>> END STATEMENT SET;
> >>>>>
> >>>>> We didn't follow up on this proposal, to keep the focus on the
> FLIP-84
> >>>>> Table API changes and to not dive into a discussion about multiline
> SQL
> >>>>> query support [4].
> >>>>>
> >>>>> While this feature is clearly based on multiple SQL queries, I think
> it
> >>>> is
> >>>>> a bit different from what we usually understand as multiline SQL
> >> support.
> >>>>> That's because a statement set ends up to be a single Flink job.
> Hence,
> >>>>> there is no need on the Flink side to coordinate the execution of
> >>>> multiple
> >>>>> jobs (incl. the discussion about blocking or async execution of
> >> queries).
> >>>>> Flink would treat the queries in a STATEMENT SET as a single query.
> >>>>>
> >>>>> I would like to start a discussion about supporting the [START|END]
> >>>>> STATEMENT SET syntax (or a different syntax with equivalent
> semantics)
> >> in
> >>>>> Flink.
> >>>>> I don't have a strong preference whether this should be implemented
> in
> >>>>> Flink's SQL core or be a purely client side implementation in the CLI
> >>>>> client. It would be good though to have parser support in Flink for
> >> this.
> >>>>>
> >>>>> What do others think?
> >>>>>
> >>>>> [1]
> >>>>>
> >>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
> >>>>> [2]
> >>>>>
> >>>>>
> >>>>
> >>
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html
> >>>>> [3]
> >>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv
> >>>>> [4]
> >>>>>
> >>>>>
> >>>>
> >>
> https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Re: [DISCUSS] SQL Syntax for Table API StatementSet

Reply via email to