Thanks for the discussion Godfrey and Timo, I like the syntax proposed by Jark and Timo:
BEGIN STATEMENT SET; INSERT INTO ...; INSERT INTO ...; END; (I didn't pay attention and didn't mean to propose START over BEGIN. I just wanted to make the point that the syntax should make it clear that a statement set is started). I think the important questions about streaming/batch queries and sync/async execution need to be discussed and solved. However, I think these points are orthogonal to the question about supporting statement sets. These issues exist today (without a SQL syntax for statement sets) and IMO such a syntax doesn't make the situation any worse or better (assuming that we agree on the limitation that all statements in a set are either streaming or batch queries). As I said before, from Flink's point of view a statement set can be replaced by a single INSERT INTO query (either streaming or batch, depending on the type of queries in the set). Best, Fabian Am Mo., 22. Juni 2020 um 10:55 Uhr schrieb Timo Walther <twal...@apache.org >: > Hi Godfrey, > > 1) Of course we should have unified behavior for API and SQL file. > However, this doesn't mean that `executeSql` needs to become blocking or > support multi-statements. In a programmatic API, async is more useful as > a user can control long running jobs (regardless of batch or streaming). > Sync behavior can be expressed on an async API (e.g. > TableResult.await()). If we support multi-statements in the API, it will > not be supported through `executeSql`, this part of the API has been > finalized in the last release. We need to come up with a new API method. > > 3) I think forcing async execution also for multiline batch queries in > SQL can be future work. Either we enable those using a flag or special > syntax in a SQL file. Or do we want this flecibility already in the > first multi-statement support version? > > Regards, > Timo > > On 17.06.20 15:27, godfrey he wrote: > > Hi Fabian, Jack, Timo > > > > Thanks for the suggestions. > > > > Regarding the SQL syntax, BEGIN is more popular than START. I'm fine with > > the syntax Timo suggested. > > > > Regarding whether this should be implemented in Flink's SQL core. I think > > there are three things to consider: > > > > First one, do we need to unify the default behavior of API and sql file? > > The execution of `TableEnvironment#executeSql` method and > > `StatementSet#execute` method is asynchronous > > for both batch and streaming, which means these methods just submit the > job > > and then return a `TableResult`. > > While for batch processing (e.g. hive, traditional databases), the > default > > behavior is sync mode. > > So this behavior is different from the APIs. I think it's better we can > > unify the default behavior. > > > > Second one, how to determine the execution behavior of each statement in > a > > file which contains both > > batch sql and streaming sql. Currently, we have a flag to tell the > planner > > that the TableEnvironment is > > batch env or stream env which can determine the default behavior. We want > > to remove > > the flag and unify the TableEnvironment in the future. Then > > TableEnvironment can execute both > > batch sql and streaming sql. Timo and I have a discussion about this on > > slack: for DML & DQL, > > if a statement has keywords like `EMIT STREAM`, it's streaming sql and > will > > be executed in async mode. > > otherwise it's a batch sql and will be executed in sync mode. > > > > Three one, how to flexibly support execution mode switching for batch > sql. > > For streaming sql, all DMLs & DQLs should be in async mode because the > job > > may be never finished. > > While for batch sql, I think both modes are needed. I know some platforms > > execute batch sql > > in async mode, and then continuously monitor the job status. Do we need > > introduce `set execute-mode=xx` command > > or new sql syntax like `START SYNC EXECUTION` ? > > > > For sql-client or other projects, we can easily decide what behavior an > app > > can support. > > Just as Jark said, many downstream projects have the same requirement for > > multiple statement support, > > but they may have different execution behaviors. It's great if flink can > > support flexible execution modes. > > Or Flink core just defines the syntax, provides parser and supports a > > default execution mode. > > The downstream projects can use the APIs and parsed results to decide how > > to execute a sql. > > > > Best, > > Godfrey > > > > Timo Walther <twal...@apache.org> 于2020年6月17日周三 下午6:32写道: > > > >> Hi Fabian, > >> > >> thanks for the proposal. I agree that we should have consensus on the > >> SQL syntax as well and thus finalize the concepts introduced in FLIP-84. > >> > >> I would favor Jark's proposal. I would like to propose the following > >> syntax: > >> > >> BEGIN STATEMENT SET; > >> INSERT INTO ...; > >> INSERT INTO ...; > >> END; > >> > >> 1) BEGIN and END are commonly used for blocks in SQL. > >> > >> 2) We should not start mixing START/BEGIN for different kind of blocks. > >> Because that can also be confusing for users. There is no additional > >> helpful semantic in using START over BEGIN. > >> > >> 3) Instead, we should rather parameterize the block statament with > >> `STATEMENT SET` and keep the END of the block simple (also similar to > >> CASE ... WHEN ... END). > >> > >> 4) If we look at Jark's example in SQL Server, the BEGIN is also > >> parameterized by `BEGIN { TRAN | TRANSACTION }`. > >> > >> 5) Also in Java curly braces are used for both classes, methods, and > >> loops for different purposes parameterized by the preceding code. > >> > >> Regards, > >> Timo > >> > >> > >> On 17.06.20 11:36, Fabian Hueske wrote: > >>> Thanks for joining this discussion Jark! > >>> > >>> This feature is a bit different from BEGIN TRANSACTION / COMMIT and > >> BEGIN / > >>> END. > >>> > >>> The only commonality is that all three group multiple statements. > >>> * BEGIN TRANSACTION / COMMIT creates a transactional context that > >>> guarantees atomicity, consistency, and isolation. Statements and > queries > >>> are sequentially executed. > >>> * BEGIN / END defines a block of statements just like curly braces ({ > and > >>> }) do in Java. The statements (which can also include variable > >> definitions > >>> and printing) are sequentially executed. > >>> * A statement set defines a group of statements that are optimized > >> together > >>> and jointly executed at the same time, i.e., there is no sequence or > >> order. > >>> > >>> A statement set (consisting of multiple INSERT INTO statements) behaves > >>> just like a single INSERT INTO statement. > >>> Everywhere where an INSERT INTO statement can be executed, it should be > >>> possible to execute a statement set consisting of multiple INSERT INTO > >>> statements. > >>> That's also why I think that statement sets are orthogonal to > >>> multi-statement execution. > >>> > >>> As I said before, I'm happy to discuss syntax proposals for statement > >> sets. > >>> However, I think a BEGIN / END syntax for statement sets would confuse > >>> users who know this syntax from MySQL, SQL Server, or another DBMS. > >>> > >>> Thanks, > >>> Fabian > >>> > >>> > >>> Am Di., 16. Juni 2020 um 05:07 Uhr schrieb Jark Wu <imj...@gmail.com>: > >>> > >>>> Hi Fabian, > >>>> > >>>> Thanks for starting this discussion. I think this is a very important > >>>> syntax to support file mode and multi-statement for SQL Client. > >>>> I'm +1 to introduce a syntax to group SQL statements to execute > >> together. > >>>> > >>>> As a reference, traditional database systems also have similar syntax, > >> such > >>>> as "START/BEGIN TRANSACTION ... COMMIT" to group statements as a > >>>> transaction [1], > >>>> and also "BEGIN ... END" [2] [3] to group a set of SQL statements that > >>>> execute together. > >>>> > >>>> Maybe we can also use "BEGIN ... END" syntax which is much simpler? > >>>> > >>>> Regarding where to implement, I also prefer to have it in Flink SQL > >> core, > >>>> here are some reasons from my side: > >>>> 1) I think many downstream projects (e.g Zeppelin) will have the same > >>>> requirement. It would be better to have it in core instead of > >> reinventing > >>>> the wheel by users. > >>>> 2) Having it in SQL CLI means it is a standard syntax to support > >> statement > >>>> set in Flink. So I think it makes sense to have it in core too, > >> otherwise, > >>>> it looks like a broken feature. > >>>> In 1.10, CREATE VIEW is only supported in SQL CLI, not > supported in > >>>> TableEnvironment, which confuses many users. > >>>> 3) Currently, we are moving statement parsing to use sql-parser > >>>> (FLINK-17728). Calcite has a good support for parsing > multi-statements. > >>>> It will be tricky to parse multi-statements only in SQL Client. > >>>> > >>>> Best, > >>>> Jark > >>>> > >>>> [1]: > >>>> > >>>> > >> > https://docs.microsoft.com/en-us/sql/t-sql/language-elements/begin-transaction-transact-sql?view=sql-server-ver15 > >>>> [2]: > >>>> > >>>> > >> > https://www.sqlservertutorial.net/sql-server-stored-procedures/sql-server-begin-end/ > >>>> [3]: https://dev.mysql.com/doc/refman/8.0/en/begin-end.html > >>>> > >>>> On Mon, 15 Jun 2020 at 20:50, Fabian Hueske <fhue...@gmail.com> > wrote: > >>>> > >>>>> Hi everyone, > >>>>> > >>>>> FLIP-84 [1] added the concept of a "statement set" to group multiple > >>>> INSERT > >>>>> INTO statements (SQL or Table API) together. The statements in a > >>>> statement > >>>>> set are jointly optimized and executed as a single Flink job. > >>>>> > >>>>> I would like to start a discussion about a SQL syntax to group > multiple > >>>>> INSERT INTO statements in a statement set. The use case would be to > >>>> expose > >>>>> the statement set feature to a solely text based client for Flink SQL > >>>> such > >>>>> as Flink's SQL CLI [1]. > >>>>> > >>>>> During the discussion of FLIP-84, we had briefly talked about such a > >>>> syntax > >>>>> [3]. > >>>>> > >>>>> START STATEMENT SET; > >>>>> INSERT INTO ... SELECT ...; > >>>>> INSERT INTO ... SELECT ...; > >>>>> ... > >>>>> END STATEMENT SET; > >>>>> > >>>>> We didn't follow up on this proposal, to keep the focus on the > FLIP-84 > >>>>> Table API changes and to not dive into a discussion about multiline > SQL > >>>>> query support [4]. > >>>>> > >>>>> While this feature is clearly based on multiple SQL queries, I think > it > >>>> is > >>>>> a bit different from what we usually understand as multiline SQL > >> support. > >>>>> That's because a statement set ends up to be a single Flink job. > Hence, > >>>>> there is no need on the Flink side to coordinate the execution of > >>>> multiple > >>>>> jobs (incl. the discussion about blocking or async execution of > >> queries). > >>>>> Flink would treat the queries in a STATEMENT SET as a single query. > >>>>> > >>>>> I would like to start a discussion about supporting the [START|END] > >>>>> STATEMENT SET syntax (or a different syntax with equivalent > semantics) > >> in > >>>>> Flink. > >>>>> I don't have a strong preference whether this should be implemented > in > >>>>> Flink's SQL core or be a purely client side implementation in the CLI > >>>>> client. It would be good though to have parser support in Flink for > >> this. > >>>>> > >>>>> What do others think? > >>>>> > >>>>> [1] > >>>>> > >>>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878 > >>>>> [2] > >>>>> > >>>>> > >>>> > >> > https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/sqlClient.html > >>>>> [3] > >>>>> > >>>>> > >>>> > >> > https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#heading=h.al86t1h4ecuv > >>>>> [4] > >>>>> > >>>>> > >>>> > >> > https://lists.apache.org/thread.html/rf494e227c47010c91583f90eeaf807d3a4c3eb59d105349afd5fdc31%40%3Cdev.flink.apache.org%3E > >>>>> > >>>> > >>> > >> > >> > > > >