Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Paul Lam Tue, 29 Mar 2022 02:54:22 -0700

Hi team,

Sorry for the late follow-up. It took me some time to do some research.


TL;DR  It’s good to express savepoint in SQL statements. We should join efforts 
withFlink community to discuss SQL syntax for savepoint statements.There’re
mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And 
the rests are implementation details, such as how to return the query ID.

We had an offline discussion on DingTalk last week, and I believe we’ve reached 
a consensus on some issues.

As pointed out in the previous mails, we should consider
1. how to trigger a savepoint?
2. how to find the available savepoints/checkpoints for a job?
3. how to specify a savepoint/checkpoint for restore?

However, 3 is already supported by Flink SQL client, leaving 2 questions. As we 
discussed previous, the most straightforward solution is to extend Flink’s SQL 
parser to support savepointcommand. In such way, we treat savepoint
command as a normal SQL statement. So we could split the topic into SQL 
syntax and implementation.

WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to align 
these efforts with Flink community. So I think we should draft a proposal and 
start a discussion at Flink community to determine a solution , then we could
implement it in Kyuubi first and push back to Flink (I’m planning to start a 
discussion in Flink community this week).

We have two solutions (thanks to Cheng):

1) ANSI SQL

   `CALL trigger_savepoint($query_id)`
   `CALL show_savepoint($query_id)`

pros: 
- no syntax conflict
- respect ANSI SQL

cons:
- CALL is not used in Flink SQL yet
- not sure if it’s viable to return savepoint paths, because stored procedures 
  should return rows count in normal cases

2)  Custom command

  `TRIGGER SAVEPOINT $query_id`
  `SHOW SAVEPOINT $query_id`

pros:
- simple syntax, easy to understand

cons:
- need to introduce new reserved keywords TRIGGER/SAVEPOINT 
- not ANSI-SQL compatible


WRT implementations, first we need a query ID, namely Flink job ID,
which we could acquire through TableResult with a few adjustments 
to ExecuteStatement in Flink Engine. 

There 2 approach to return the query ID to the clients. 

1) TGetQueryIdReq/Resp 
The clients need to request the query ID when a query is finished. 
Given that the origin semantic for the Req is to return all query IDs in the 
session[1], 
we may needed change it “the ID of the latest query”, or else it would be 
difficult 
for users to figure out which ID is the right one.

2) Return it in the result set 
This approach is straightforward. Flink returns a -1 as the affected rows, 
which is not very useful. We can simply replace that with the query ID.

Please tell me what do you think. Thanks a lot!

[1] 
https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761
 
<https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761>

Best,
Paul Lam

> 2022年3月24日 18:15，Vino Yang <yanghua1...@gmail.com> 写道：
> 
> Hi Paul,
> 
> Big +1 for the proposal.
> 
> You can summarize all of this into a design document. And drive this feature!
> 
> Best,
> Vino
> 
> Paul Lam <paullin3...@gmail.com> 于2022年3月22日周二 14:40写道：
>> 
>> Hi Kent,
>> 
>> Thanks for your pointer!
>> 
>> TGetQueryIdReq/Resp looks very promising.
>> 
>> Best,
>> Paul Lam
>> 
>>> 2022年3月21日 12:20，Kent Yao <y...@apache.org> 写道：
>>> 
>>> 
>>

Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Reply via email to