Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Cheng Pan Wed, 30 Mar 2022 01:01:40 -0700

Thanks Paul, I agree that we’ve reached a consensus on high-level, 1)
use SQL to manipulate the savepoint, 2) follow upstreaming-first
philosophy in SQL syntax and RPC protocol to achieve the best
compatibility and user experience.


Specifically for details, add some comments.

> 1) ANSI SQL
>   `CALL trigger_savepoint($query_id)`
>   `CALL show_savepoint($query_id)`

We could give more flexibility to the concept of ANSI SQL-like.

For instance, we have

SHOW TABLES [LIKE ...]
ALTER TABLE <table_name> SET xxx
ALTER TABLE ADD ...
DROP TABLE <table_name>
SELECT xxx FROM <table_name>
DESC <table_name>

We can extend SQL in same style for savepoints, e.g.

SHOW SAVEPOINTS <query_id>
CREATE SAVEPOINT <query_id>
DROP SAVEPOINT <query_id>
SELECT ... FROM <system.savepoint_table_name> WHERE ...
DESC <query_id>

One example is DistSQL[1]

The command style is specific to introduce new SQL action keywords, e.g.

OPTIMIZE <table_name>, VACUUM <table_name>, KILL <query_id>, KILL
QUERY <query_id>

Usually, different engines/databases may have different syntax for the
same behavior or different behavior in the same syntax. Unless the
syntax has been adopted by the upstream, I prefer to use

CALL <procedure_name>(arg1, arg2, ...)

to avoid conflicting, and switch to the official syntax once the
upstream introduces the new syntax.

> There 2 approach to return the query ID to the clients.
>
> 1) TGetQueryIdReq/Resp
> The clients need to request the query ID when a query is finished.
> Given that the origin semantic for the Req is to return all query IDs in the 
> session[1],
> we may needed change it “the ID of the latest query”, or else it would be 
> difficult
> for users to figure out which ID is the right one.
>
> 2) Return it in the result set
> This approach is straightforward. Flink returns a -1 as the affected rows,
> which is not very useful. We can simply replace that with the query ID.

Have a look on the TGetQueryIdReq/Resp, I think we can simplify the procedure to

1. client sends an ExecuteQueryReq
2. server returns an OpHandle to client immediately
3. client sends TGetQueryIdReq(OpHandle) to ask for QueryId
periodically until a legal result.
4. server returns the corresponding TGetQueryIdResp(QueryId) is
available, otherwise returns a predefined QueryId constant e.g.
'UNDEFINED_QUERY_ID' if the statement does not accepted by the engine
(there is no queryId for the stmt now)

[1] https://github.com/apache/shardingsphere/releases/tag/5.1.0

Thanks,
Cheng Pan

On Tue, Mar 29, 2022 at 7:13 PM 林小铂 <link3...@163.com> wrote:
>
> Hi team,
>
> Sorry for the late follow-up. It took me some time to do some research.
>
> TL;DR  It’s good to express savepoint in SQL statements. We should join 
> efforts
> withFlink community to discuss SQL syntax for savepoint statements.There’re
> mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And
> the rests are implementation details, such as how to return the query ID.
>
> We had an offline discussion on DingTalk last week, and I believe we’ve 
> reached
> a consensus on some issues.
>
> As pointed out in the previous mails, we should consider
> 1. how to trigger a savepoint?
> 2. how to find the available savepoints/checkpoints for a job?
> 3. how to specify a savepoint/checkpoint for restore?
>
> However, 3 is already supported by Flink SQL client, leaving 2 questions. As 
> we
> discussed previous, the most straightforward solution is to extend Flink’s SQL
> parser to support savepointcommand. In such way, we treat savepoint
> command as a normal SQL statement. So we could split the topic into SQL
> syntax and implementation.
>
> WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to align
> these efforts with Flink community. So I think we should draft a proposal and
> start a discussion at Flink community to determine a solution , then we could
> implement it in Kyuubi first and push back to Flink (I’m planning to start a
> discussion in Flink community this week).
>
> We have two solutions (thanks to Cheng):
>
> 1) ANSI SQL
>
>    `CALL trigger_savepoint($query_id)`
>    `CALL show_savepoint($query_id)`
>
> pros:
> - no syntax conflict
> - respect ANSI SQL
>
> cons:
> - CALL is not used in Flink SQL yet
> - not sure if it’s viable to return savepoint paths, because stored procedures
>   should return rows count in normal cases
>
> 2)  Custom command
>
>   `TRIGGER SAVEPOINT $query_id`
>   `SHOW SAVEPOINT $query_id`
>
> pros:
> - simple syntax, easy to understand
>
> cons:
> - need to introduce new reserved keywords TRIGGER/SAVEPOINT
> - not ANSI-SQL compatible
>
>
> WRT implementations, first we need a query ID, namely Flink job ID,
> which we could acquire through TableResult with a few adjustments
> to ExecuteStatement in Flink Engine.
>
> There 2 approach to return the query ID to the clients.
>
> 1) TGetQueryIdReq/Resp
> The clients need to request the query ID when a query is finished.
> Given that the origin semantic for the Req is to return all query IDs in the 
> session[1],
> we may needed change it “the ID of the latest query”, or else it would be 
> difficult
> for users to figure out which ID is the right one.
>
> 2) Return it in the result set
> This approach is straightforward. Flink returns a -1 as the affected rows,
> which is not very useful. We can simply replace that with the query ID.
>
> Please tell me what do you think. Thanks a lot!
>
> [1] 
> https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761
>
>
> > 2022年3月24日 18:15，Vino Yang <yanghua1...@gmail.com> 写道：
> >
> > Hi Paul,
> >
> > Big +1 for the proposal.
> >
> > You can summarize all of this into a design document. And drive this 
> > feature!
> >
> > Best,
> > Vino
> >
> > Paul Lam <paullin3...@gmail.com> 于2022年3月22日周二 14:40写道：
> >>
> >> Hi Kent,
> >>
> >> Thanks for your pointer!
> >>
> >> TGetQueryIdReq/Resp looks very promising.
> >>
> >> Best,
> >> Paul Lam
> >>
> >>> 2022年3月21日 12:20，Kent Yao <y...@apache.org> 写道：
> >>>
> >>>
> >>
>

Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Reply via email to