Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Paul Lam Wed, 30 Mar 2022 19:28:46 -0700

Agreed! Thanks, Cheng!

Best,
Paul Lam


> 2022年3月30日 23:55，Cheng Pan <pan3...@gmail.com> 写道：
> 
>> Personally speaking, I think 3) is most likely to be adopted by Flink 
>> community, since TRIGGER/SAVEPOINT are already reserved keywords in Flink 
>> SQL[1]. Should we propose only 3) to Flink community?
> 
> I respect your opinion and experience in Flink community, and if there
> are other voices in the Flink community, we can offer other options.
> 
> I think the Kyuubi community can get a lot of benefits by following
> the upstream-first philosophy.
> 
> Thanks,
> Cheng Pan
> 
> On Wed, Mar 30, 2022 at 6:38 PM Paul Lam <paullin3...@gmail.com> wrote:
>> 
>> Hi Cheng,
>> 
>> Thanks a lot for your input! IMHO, now the whole design gets pretty clear.
>> 
>> I try to re-summarize the following plan, please collect me if I’m wrong.
>> 
>> 1. Discuss savepoint SQL syntax with Flink community
>> 
>> There 3 types of syntax:
>> 
>> 1) ANSI SQL style with extended object
>> 
>>> SHOW SAVEPOINTS <query_id>
>>> CREATE SAVEPOINT <query_id>
>>> DROP SAVEPOINT <savepoint_id>
>> 
>> 
>> 2) ANSI SQL style with stored procedure
>> 
>> CALL trigger_savepoint(…)
>> CALL show_savepoints(…)
>> CALL drop_savepoint(...)
>> 
>> 3) Command-like style
>> 
>> TRIGGER SAVEPOINT <query_id>
>> SHOW SAVEPOINTS <query_id>
>> REMOVE SAVEPOINT <savepoint_path>
>> 
>> If we reach agreement with Flink community on one of them, we adopt
>> the syntax. There’re might be some duplicate efforts in a short time,
>> but finally we will converge, and reuse most if not all functionalities
>> that Flink provides.
>> 
>> Personally speaking, I think 3) is most likely to be adopted by Flink
>> community, since TRIGGER/SAVEPOINT are already reserved
>> keywords in Flink SQL[1]. Should we propose only 3) to Flink
>> community?
>> 
>> 2. Draft a KIP about savepoint management
>> 
>> TODO list as far we can see:
>> 1) Support retrieving query ID in Flink engine
>> 2) Introduce a SQL layer to support new SQL syntax if needed (compatible
>>    with Flink SQL)
>> 3) Support savepoint related operations in Flink engine
>> 4) Extend Beeline to support query ID
>> 
>> Best,
>> Paul Lam
>> 
>>> 2022年3月30日 16:01，Cheng Pan <pan3...@gmail.com> 写道：
>>> 
>>> Thanks Paul, I agree that we’ve reached a consensus on high-level, 1)
>>> use SQL to manipulate the savepoint, 2) follow upstreaming-first
>>> philosophy in SQL syntax and RPC protocol to achieve the best
>>> compatibility and user experience.
>>> 
>>> Specifically for details, add some comments.
>>> 
>>>> 1) ANSI SQL
>>>> `CALL trigger_savepoint($query_id)`
>>>> `CALL show_savepoint($query_id)`
>>> 
>>> We could give more flexibility to the concept of ANSI SQL-like.
>>> 
>>> For instance, we have
>>> 
>>> SHOW TABLES [LIKE ...]
>>> ALTER TABLE <table_name> SET xxx
>>> ALTER TABLE ADD ...
>>> DROP TABLE <table_name>
>>> SELECT xxx FROM <table_name>
>>> DESC <table_name>
>>> 
>>> We can extend SQL in same style for savepoints, e.g.
>>> 
>>> SHOW SAVEPOINTS <query_id>
>>> CREATE SAVEPOINT <query_id>
>>> DROP SAVEPOINT <query_id>
>>> SELECT ... FROM <system.savepoint_table_name> WHERE ...
>>> DESC <query_id>
>>> 
>>> One example is DistSQL[1]
>>> 
>>> The command style is specific to introduce new SQL action keywords, e.g.
>>> 
>>> OPTIMIZE <table_name>, VACUUM <table_name>, KILL <query_id>, KILL
>>> QUERY <query_id>
>>> 
>>> Usually, different engines/databases may have different syntax for the
>>> same behavior or different behavior in the same syntax. Unless the
>>> syntax has been adopted by the upstream, I prefer to use
>>> 
>>> CALL <procedure_name>(arg1, arg2, ...)
>>> 
>>> to avoid conflicting, and switch to the official syntax once the
>>> upstream introduces the new syntax.
>>> 
>>>> There 2 approach to return the query ID to the clients.
>>>> 
>>>> 1) TGetQueryIdReq/Resp
>>>> The clients need to request the query ID when a query is finished.
>>>> Given that the origin semantic for the Req is to return all query IDs in 
>>>> the session[1],
>>>> we may needed change it “the ID of the latest query”, or else it would be 
>>>> difficult
>>>> for users to figure out which ID is the right one.
>>>> 
>>>> 2) Return it in the result set
>>>> This approach is straightforward. Flink returns a -1 as the affected rows,
>>>> which is not very useful. We can simply replace that with the query ID.
>>> 
>>> Have a look on the TGetQueryIdReq/Resp, I think we can simplify the 
>>> procedure to
>>> 
>>> 1. client sends an ExecuteQueryReq
>>> 2. server returns an OpHandle to client immediately
>>> 3. client sends TGetQueryIdReq(OpHandle) to ask for QueryId
>>> periodically until a legal result.
>>> 4. server returns the corresponding TGetQueryIdResp(QueryId) is
>>> available, otherwise returns a predefined QueryId constant e.g.
>>> 'UNDEFINED_QUERY_ID' if the statement does not accepted by the engine
>>> (there is no queryId for the stmt now)
>>> 
>>> [1] https://github.com/apache/shardingsphere/releases/tag/5.1.0
>>> 
>>> Thanks,
>>> Cheng Pan
>>> 
>>> On Tue, Mar 29, 2022 at 7:13 PM 林小铂 <link3...@163.com> wrote:
>>>> 
>>>> Hi team,
>>>> 
>>>> Sorry for the late follow-up. It took me some time to do some research.
>>>> 
>>>> TL;DR  It’s good to express savepoint in SQL statements. We should join 
>>>> efforts
>>>> withFlink community to discuss SQL syntax for savepoint statements.There’re
>>>> mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And
>>>> the rests are implementation details, such as how to return the query ID.
>>>> 
>>>> We had an offline discussion on DingTalk last week, and I believe we’ve 
>>>> reached
>>>> a consensus on some issues.
>>>> 
>>>> As pointed out in the previous mails, we should consider
>>>> 1. how to trigger a savepoint?
>>>> 2. how to find the available savepoints/checkpoints for a job?
>>>> 3. how to specify a savepoint/checkpoint for restore?
>>>> 
>>>> However, 3 is already supported by Flink SQL client, leaving 2 questions. 
>>>> As we
>>>> discussed previous, the most straightforward solution is to extend Flink’s 
>>>> SQL
>>>> parser to support savepointcommand. In such way, we treat savepoint
>>>> command as a normal SQL statement. So we could split the topic into SQL
>>>> syntax and implementation.
>>>> 
>>>> WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to 
>>>> align
>>>> these efforts with Flink community. So I think we should draft a proposal 
>>>> and
>>>> start a discussion at Flink community to determine a solution , then we 
>>>> could
>>>> implement it in Kyuubi first and push back to Flink (I’m planning to start 
>>>> a
>>>> discussion in Flink community this week).
>>>> 
>>>> We have two solutions (thanks to Cheng):
>>>> 
>>>> 1) ANSI SQL
>>>> 
>>>>  `CALL trigger_savepoint($query_id)`
>>>>  `CALL show_savepoint($query_id)`
>>>> 
>>>> pros:
>>>> - no syntax conflict
>>>> - respect ANSI SQL
>>>> 
>>>> cons:
>>>> - CALL is not used in Flink SQL yet
>>>> - not sure if it’s viable to return savepoint paths, because stored 
>>>> procedures
>>>> should return rows count in normal cases
>>>> 
>>>> 2)  Custom command
>>>> 
>>>> `TRIGGER SAVEPOINT $query_id`
>>>> `SHOW SAVEPOINT $query_id`
>>>> 
>>>> pros:
>>>> - simple syntax, easy to understand
>>>> 
>>>> cons:
>>>> - need to introduce new reserved keywords TRIGGER/SAVEPOINT
>>>> - not ANSI-SQL compatible
>>>> 
>>>> 
>>>> WRT implementations, first we need a query ID, namely Flink job ID,
>>>> which we could acquire through TableResult with a few adjustments
>>>> to ExecuteStatement in Flink Engine.
>>>> 
>>>> There 2 approach to return the query ID to the clients.
>>>> 
>>>> 1) TGetQueryIdReq/Resp
>>>> The clients need to request the query ID when a query is finished.
>>>> Given that the origin semantic for the Req is to return all query IDs in 
>>>> the session[1],
>>>> we may needed change it “the ID of the latest query”, or else it would be 
>>>> difficult
>>>> for users to figure out which ID is the right one.
>>>> 
>>>> 2) Return it in the result set
>>>> This approach is straightforward. Flink returns a -1 as the affected rows,
>>>> which is not very useful. We can simply replace that with the query ID.
>>>> 
>>>> Please tell me what do you think. Thanks a lot!
>>>> 
>>>> [1] 
>>>> https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761
>>>> 
>>>> 
>>>>> 2022年3月24日 18:15，Vino Yang <yanghua1...@gmail.com> 写道：
>>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> Big +1 for the proposal.
>>>>> 
>>>>> You can summarize all of this into a design document. And drive this 
>>>>> feature!
>>>>> 
>>>>> Best,
>>>>> Vino
>>>>> 
>>>>> Paul Lam <paullin3...@gmail.com> 于2022年3月22日周二 14:40写道：
>>>>>> 
>>>>>> Hi Kent,
>>>>>> 
>>>>>> Thanks for your pointer!
>>>>>> 
>>>>>> TGetQueryIdReq/Resp looks very promising.
>>>>>> 
>>>>>> Best,
>>>>>> Paul Lam
>>>>>> 
>>>>>>> 2022年3月21日 12:20，Kent Yao <y...@apache.org> 写道：
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>

Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Reply via email to