> Personally speaking, I think 3) is most likely to be adopted by Flink > community, since TRIGGER/SAVEPOINT are already reserved keywords in Flink > SQL[1]. Should we propose only 3) to Flink community?
I respect your opinion and experience in Flink community, and if there are other voices in the Flink community, we can offer other options. I think the Kyuubi community can get a lot of benefits by following the upstream-first philosophy. Thanks, Cheng Pan On Wed, Mar 30, 2022 at 6:38 PM Paul Lam <paullin3...@gmail.com> wrote: > > Hi Cheng, > > Thanks a lot for your input! IMHO, now the whole design gets pretty clear. > > I try to re-summarize the following plan, please collect me if I’m wrong. > > 1. Discuss savepoint SQL syntax with Flink community > > There 3 types of syntax: > > 1) ANSI SQL style with extended object > > > SHOW SAVEPOINTS <query_id> > > CREATE SAVEPOINT <query_id> > > DROP SAVEPOINT <savepoint_id> > > > 2) ANSI SQL style with stored procedure > > CALL trigger_savepoint(…) > CALL show_savepoints(…) > CALL drop_savepoint(...) > > 3) Command-like style > > TRIGGER SAVEPOINT <query_id> > SHOW SAVEPOINTS <query_id> > REMOVE SAVEPOINT <savepoint_path> > > If we reach agreement with Flink community on one of them, we adopt > the syntax. There’re might be some duplicate efforts in a short time, > but finally we will converge, and reuse most if not all functionalities > that Flink provides. > > Personally speaking, I think 3) is most likely to be adopted by Flink > community, since TRIGGER/SAVEPOINT are already reserved > keywords in Flink SQL[1]. Should we propose only 3) to Flink > community? > > 2. Draft a KIP about savepoint management > > TODO list as far we can see: > 1) Support retrieving query ID in Flink engine > 2) Introduce a SQL layer to support new SQL syntax if needed (compatible > with Flink SQL) > 3) Support savepoint related operations in Flink engine > 4) Extend Beeline to support query ID > > Best, > Paul Lam > > > 2022年3月30日 16:01,Cheng Pan <pan3...@gmail.com> 写道: > > > > Thanks Paul, I agree that we’ve reached a consensus on high-level, 1) > > use SQL to manipulate the savepoint, 2) follow upstreaming-first > > philosophy in SQL syntax and RPC protocol to achieve the best > > compatibility and user experience. > > > > Specifically for details, add some comments. > > > >> 1) ANSI SQL > >> `CALL trigger_savepoint($query_id)` > >> `CALL show_savepoint($query_id)` > > > > We could give more flexibility to the concept of ANSI SQL-like. > > > > For instance, we have > > > > SHOW TABLES [LIKE ...] > > ALTER TABLE <table_name> SET xxx > > ALTER TABLE ADD ... > > DROP TABLE <table_name> > > SELECT xxx FROM <table_name> > > DESC <table_name> > > > > We can extend SQL in same style for savepoints, e.g. > > > > SHOW SAVEPOINTS <query_id> > > CREATE SAVEPOINT <query_id> > > DROP SAVEPOINT <query_id> > > SELECT ... FROM <system.savepoint_table_name> WHERE ... > > DESC <query_id> > > > > One example is DistSQL[1] > > > > The command style is specific to introduce new SQL action keywords, e.g. > > > > OPTIMIZE <table_name>, VACUUM <table_name>, KILL <query_id>, KILL > > QUERY <query_id> > > > > Usually, different engines/databases may have different syntax for the > > same behavior or different behavior in the same syntax. Unless the > > syntax has been adopted by the upstream, I prefer to use > > > > CALL <procedure_name>(arg1, arg2, ...) > > > > to avoid conflicting, and switch to the official syntax once the > > upstream introduces the new syntax. > > > >> There 2 approach to return the query ID to the clients. > >> > >> 1) TGetQueryIdReq/Resp > >> The clients need to request the query ID when a query is finished. > >> Given that the origin semantic for the Req is to return all query IDs in > >> the session[1], > >> we may needed change it “the ID of the latest query”, or else it would be > >> difficult > >> for users to figure out which ID is the right one. > >> > >> 2) Return it in the result set > >> This approach is straightforward. Flink returns a -1 as the affected rows, > >> which is not very useful. We can simply replace that with the query ID. > > > > Have a look on the TGetQueryIdReq/Resp, I think we can simplify the > > procedure to > > > > 1. client sends an ExecuteQueryReq > > 2. server returns an OpHandle to client immediately > > 3. client sends TGetQueryIdReq(OpHandle) to ask for QueryId > > periodically until a legal result. > > 4. server returns the corresponding TGetQueryIdResp(QueryId) is > > available, otherwise returns a predefined QueryId constant e.g. > > 'UNDEFINED_QUERY_ID' if the statement does not accepted by the engine > > (there is no queryId for the stmt now) > > > > [1] https://github.com/apache/shardingsphere/releases/tag/5.1.0 > > > > Thanks, > > Cheng Pan > > > > On Tue, Mar 29, 2022 at 7:13 PM 林小铂 <link3...@163.com> wrote: > >> > >> Hi team, > >> > >> Sorry for the late follow-up. It took me some time to do some research. > >> > >> TL;DR It’s good to express savepoint in SQL statements. We should join > >> efforts > >> withFlink community to discuss SQL syntax for savepoint statements.There’re > >> mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And > >> the rests are implementation details, such as how to return the query ID. > >> > >> We had an offline discussion on DingTalk last week, and I believe we’ve > >> reached > >> a consensus on some issues. > >> > >> As pointed out in the previous mails, we should consider > >> 1. how to trigger a savepoint? > >> 2. how to find the available savepoints/checkpoints for a job? > >> 3. how to specify a savepoint/checkpoint for restore? > >> > >> However, 3 is already supported by Flink SQL client, leaving 2 questions. > >> As we > >> discussed previous, the most straightforward solution is to extend Flink’s > >> SQL > >> parser to support savepointcommand. In such way, we treat savepoint > >> command as a normal SQL statement. So we could split the topic into SQL > >> syntax and implementation. > >> > >> WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to > >> align > >> these efforts with Flink community. So I think we should draft a proposal > >> and > >> start a discussion at Flink community to determine a solution , then we > >> could > >> implement it in Kyuubi first and push back to Flink (I’m planning to start > >> a > >> discussion in Flink community this week). > >> > >> We have two solutions (thanks to Cheng): > >> > >> 1) ANSI SQL > >> > >> `CALL trigger_savepoint($query_id)` > >> `CALL show_savepoint($query_id)` > >> > >> pros: > >> - no syntax conflict > >> - respect ANSI SQL > >> > >> cons: > >> - CALL is not used in Flink SQL yet > >> - not sure if it’s viable to return savepoint paths, because stored > >> procedures > >> should return rows count in normal cases > >> > >> 2) Custom command > >> > >> `TRIGGER SAVEPOINT $query_id` > >> `SHOW SAVEPOINT $query_id` > >> > >> pros: > >> - simple syntax, easy to understand > >> > >> cons: > >> - need to introduce new reserved keywords TRIGGER/SAVEPOINT > >> - not ANSI-SQL compatible > >> > >> > >> WRT implementations, first we need a query ID, namely Flink job ID, > >> which we could acquire through TableResult with a few adjustments > >> to ExecuteStatement in Flink Engine. > >> > >> There 2 approach to return the query ID to the clients. > >> > >> 1) TGetQueryIdReq/Resp > >> The clients need to request the query ID when a query is finished. > >> Given that the origin semantic for the Req is to return all query IDs in > >> the session[1], > >> we may needed change it “the ID of the latest query”, or else it would be > >> difficult > >> for users to figure out which ID is the right one. > >> > >> 2) Return it in the result set > >> This approach is straightforward. Flink returns a -1 as the affected rows, > >> which is not very useful. We can simply replace that with the query ID. > >> > >> Please tell me what do you think. Thanks a lot! > >> > >> [1] > >> https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761 > >> > >> > >>> 2022年3月24日 18:15,Vino Yang <yanghua1...@gmail.com> 写道: > >>> > >>> Hi Paul, > >>> > >>> Big +1 for the proposal. > >>> > >>> You can summarize all of this into a design document. And drive this > >>> feature! > >>> > >>> Best, > >>> Vino > >>> > >>> Paul Lam <paullin3...@gmail.com> 于2022年3月22日周二 14:40写道: > >>>> > >>>> Hi Kent, > >>>> > >>>> Thanks for your pointer! > >>>> > >>>> TGetQueryIdReq/Resp looks very promising. > >>>> > >>>> Best, > >>>> Paul Lam > >>>> > >>>>> 2022年3月21日 12:20,Kent Yao <y...@apache.org> 写道: > >>>>> > >>>>> > >>>> > >> >