Hi team, Sorry for the late follow-up. It took me some time to do some research.
TL;DR It’s good to express savepoint in SQL statements. We should join efforts withFlink community to discuss SQL syntax for savepoint statements.There’re mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And the rests are implementation details, such as how to return the query ID. We had an offline discussion on DingTalk last week, and I believe we’ve reached a consensus on some issues. As pointed out in the previous mails, we should consider 1. how to trigger a savepoint? 2. how to find the available savepoints/checkpoints for a job? 3. how to specify a savepoint/checkpoint for restore? However, 3 is already supported by Flink SQL client, leaving 2 questions. As we discussed previous, the most straightforward solution is to extend Flink’s SQL parser to support savepointcommand. In such way, we treat savepoint command as a normal SQL statement. So we could split the topic into SQL syntax and implementation. WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to align these efforts with Flink community. So I think we should draft a proposal and start a discussion at Flink community to determine a solution , then we could implement it in Kyuubi first and push back to Flink (I’m planning to start a discussion in Flink community this week). We have two solutions (thanks to Cheng): 1) ANSI SQL `CALL trigger_savepoint($query_id)` `CALL show_savepoint($query_id)` pros: - no syntax conflict - respect ANSI SQL cons: - CALL is not used in Flink SQL yet - not sure if it’s viable to return savepoint paths, because stored procedures should return rows count in normal cases 2) Custom command `TRIGGER SAVEPOINT $query_id` `SHOW SAVEPOINT $query_id` pros: - simple syntax, easy to understand cons: - need to introduce new reserved keywords TRIGGER/SAVEPOINT - not ANSI-SQL compatible WRT implementations, first we need a query ID, namely Flink job ID, which we could acquire through TableResult with a few adjustments to ExecuteStatement in Flink Engine. There 2 approach to return the query ID to the clients. 1) TGetQueryIdReq/Resp The clients need to request the query ID when a query is finished. Given that the origin semantic for the Req is to return all query IDs in the session[1], we may needed change it “the ID of the latest query”, or else it would be difficult for users to figure out which ID is the right one. 2) Return it in the result set This approach is straightforward. Flink returns a -1 as the affected rows, which is not very useful. We can simply replace that with the query ID. Please tell me what do you think. Thanks a lot! [1] https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761 <https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761> Best, Paul Lam > 2022年3月24日 18:15,Vino Yang <yanghua1...@gmail.com> 写道: > > Hi Paul, > > Big +1 for the proposal. > > You can summarize all of this into a design document. And drive this feature! > > Best, > Vino > > Paul Lam <paullin3...@gmail.com> 于2022年3月22日周二 14:40写道: >> >> Hi Kent, >> >> Thanks for your pointer! >> >> TGetQueryIdReq/Resp looks very promising. >> >> Best, >> Paul Lam >> >>> 2022年3月21日 12:20,Kent Yao <y...@apache.org> 写道: >>> >>> >>