Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Cheng Pan Wed, 30 Mar 2022 08:56:28 -0700

> Personally speaking, I think 3) is most likely to be adopted by Flink 
> community, since TRIGGER/SAVEPOINT are already reserved keywords in Flink 
> SQL[1]. Should we propose only 3) to Flink community?


I respect your opinion and experience in Flink community, and if there
are other voices in the Flink community, we can offer other options.

I think the Kyuubi community can get a lot of benefits by following
the upstream-first philosophy.

Thanks,
Cheng Pan

On Wed, Mar 30, 2022 at 6:38 PM Paul Lam <paullin3...@gmail.com> wrote:
>
> Hi Cheng,
>
> Thanks a lot for your input! IMHO, now the whole design gets pretty clear.
>
> I try to re-summarize the following plan, please collect me if I’m wrong.
>
> 1. Discuss savepoint SQL syntax with Flink community
>
> There 3 types of syntax:
>
> 1) ANSI SQL style with extended object
>
> > SHOW SAVEPOINTS <query_id>
> > CREATE SAVEPOINT <query_id>
> > DROP SAVEPOINT <savepoint_id>
>
>
> 2) ANSI SQL style with stored procedure
>
> CALL trigger_savepoint(…)
> CALL show_savepoints(…)
> CALL drop_savepoint(...)
>
> 3) Command-like style
>
> TRIGGER SAVEPOINT <query_id>
> SHOW SAVEPOINTS <query_id>
> REMOVE SAVEPOINT <savepoint_path>
>
> If we reach agreement with Flink community on one of them, we adopt
> the syntax. There’re might be some duplicate efforts in a short time,
> but finally we will converge, and reuse most if not all functionalities
> that Flink provides.
>
> Personally speaking, I think 3) is most likely to be adopted by Flink
> community, since TRIGGER/SAVEPOINT are already reserved
> keywords in Flink SQL[1]. Should we propose only 3) to Flink
> community?
>
> 2. Draft a KIP about savepoint management
>
> TODO list as far we can see:
> 1) Support retrieving query ID in Flink engine
> 2) Introduce a SQL layer to support new SQL syntax if needed (compatible
>     with Flink SQL)
> 3) Support savepoint related operations in Flink engine
> 4) Extend Beeline to support query ID
>
> Best,
> Paul Lam
>
> > 2022年3月30日 16:01，Cheng Pan <pan3...@gmail.com> 写道：
> >
> > Thanks Paul, I agree that we’ve reached a consensus on high-level, 1)
> > use SQL to manipulate the savepoint, 2) follow upstreaming-first
> > philosophy in SQL syntax and RPC protocol to achieve the best
> > compatibility and user experience.
> >
> > Specifically for details, add some comments.
> >
> >> 1) ANSI SQL
> >>  `CALL trigger_savepoint($query_id)`
> >>  `CALL show_savepoint($query_id)`
> >
> > We could give more flexibility to the concept of ANSI SQL-like.
> >
> > For instance, we have
> >
> > SHOW TABLES [LIKE ...]
> > ALTER TABLE <table_name> SET xxx
> > ALTER TABLE ADD ...
> > DROP TABLE <table_name>
> > SELECT xxx FROM <table_name>
> > DESC <table_name>
> >
> > We can extend SQL in same style for savepoints, e.g.
> >
> > SHOW SAVEPOINTS <query_id>
> > CREATE SAVEPOINT <query_id>
> > DROP SAVEPOINT <query_id>
> > SELECT ... FROM <system.savepoint_table_name> WHERE ...
> > DESC <query_id>
> >
> > One example is DistSQL[1]
> >
> > The command style is specific to introduce new SQL action keywords, e.g.
> >
> > OPTIMIZE <table_name>, VACUUM <table_name>, KILL <query_id>, KILL
> > QUERY <query_id>
> >
> > Usually, different engines/databases may have different syntax for the
> > same behavior or different behavior in the same syntax. Unless the
> > syntax has been adopted by the upstream, I prefer to use
> >
> > CALL <procedure_name>(arg1, arg2, ...)
> >
> > to avoid conflicting, and switch to the official syntax once the
> > upstream introduces the new syntax.
> >
> >> There 2 approach to return the query ID to the clients.
> >>
> >> 1) TGetQueryIdReq/Resp
> >> The clients need to request the query ID when a query is finished.
> >> Given that the origin semantic for the Req is to return all query IDs in 
> >> the session[1],
> >> we may needed change it “the ID of the latest query”, or else it would be 
> >> difficult
> >> for users to figure out which ID is the right one.
> >>
> >> 2) Return it in the result set
> >> This approach is straightforward. Flink returns a -1 as the affected rows,
> >> which is not very useful. We can simply replace that with the query ID.
> >
> > Have a look on the TGetQueryIdReq/Resp, I think we can simplify the 
> > procedure to
> >
> > 1. client sends an ExecuteQueryReq
> > 2. server returns an OpHandle to client immediately
> > 3. client sends TGetQueryIdReq(OpHandle) to ask for QueryId
> > periodically until a legal result.
> > 4. server returns the corresponding TGetQueryIdResp(QueryId) is
> > available, otherwise returns a predefined QueryId constant e.g.
> > 'UNDEFINED_QUERY_ID' if the statement does not accepted by the engine
> > (there is no queryId for the stmt now)
> >
> > [1] https://github.com/apache/shardingsphere/releases/tag/5.1.0
> >
> > Thanks,
> > Cheng Pan
> >
> > On Tue, Mar 29, 2022 at 7:13 PM 林小铂 <link3...@163.com> wrote:
> >>
> >> Hi team,
> >>
> >> Sorry for the late follow-up. It took me some time to do some research.
> >>
> >> TL;DR  It’s good to express savepoint in SQL statements. We should join 
> >> efforts
> >> withFlink community to discuss SQL syntax for savepoint statements.There’re
> >> mainly two styles of SQL syntax to discuss: ANIS-SQL and command-like. And
> >> the rests are implementation details, such as how to return the query ID.
> >>
> >> We had an offline discussion on DingTalk last week, and I believe we’ve 
> >> reached
> >> a consensus on some issues.
> >>
> >> As pointed out in the previous mails, we should consider
> >> 1. how to trigger a savepoint?
> >> 2. how to find the available savepoints/checkpoints for a job?
> >> 3. how to specify a savepoint/checkpoint for restore?
> >>
> >> However, 3 is already supported by Flink SQL client, leaving 2 questions. 
> >> As we
> >> discussed previous, the most straightforward solution is to extend Flink’s 
> >> SQL
> >> parser to support savepointcommand. In such way, we treat savepoint
> >> command as a normal SQL statement. So we could split the topic into SQL
> >> syntax and implementation.
> >>
> >> WRT SQL syntax, to follow upstreaming-first philosophy, we’d better to 
> >> align
> >> these efforts with Flink community. So I think we should draft a proposal 
> >> and
> >> start a discussion at Flink community to determine a solution , then we 
> >> could
> >> implement it in Kyuubi first and push back to Flink (I’m planning to start 
> >> a
> >> discussion in Flink community this week).
> >>
> >> We have two solutions (thanks to Cheng):
> >>
> >> 1) ANSI SQL
> >>
> >>   `CALL trigger_savepoint($query_id)`
> >>   `CALL show_savepoint($query_id)`
> >>
> >> pros:
> >> - no syntax conflict
> >> - respect ANSI SQL
> >>
> >> cons:
> >> - CALL is not used in Flink SQL yet
> >> - not sure if it’s viable to return savepoint paths, because stored 
> >> procedures
> >>  should return rows count in normal cases
> >>
> >> 2)  Custom command
> >>
> >>  `TRIGGER SAVEPOINT $query_id`
> >>  `SHOW SAVEPOINT $query_id`
> >>
> >> pros:
> >> - simple syntax, easy to understand
> >>
> >> cons:
> >> - need to introduce new reserved keywords TRIGGER/SAVEPOINT
> >> - not ANSI-SQL compatible
> >>
> >>
> >> WRT implementations, first we need a query ID, namely Flink job ID,
> >> which we could acquire through TableResult with a few adjustments
> >> to ExecuteStatement in Flink Engine.
> >>
> >> There 2 approach to return the query ID to the clients.
> >>
> >> 1) TGetQueryIdReq/Resp
> >> The clients need to request the query ID when a query is finished.
> >> Given that the origin semantic for the Req is to return all query IDs in 
> >> the session[1],
> >> we may needed change it “the ID of the latest query”, or else it would be 
> >> difficult
> >> for users to figure out which ID is the right one.
> >>
> >> 2) Return it in the result set
> >> This approach is straightforward. Flink returns a -1 as the affected rows,
> >> which is not very useful. We can simply replace that with the query ID.
> >>
> >> Please tell me what do you think. Thanks a lot!
> >>
> >> [1] 
> >> https://github.com/apache/hive/blob/bf84d8a1f715d7457037192d97676aeffa35d571/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1761
> >>
> >>
> >>> 2022年3月24日 18:15，Vino Yang <yanghua1...@gmail.com> 写道：
> >>>
> >>> Hi Paul,
> >>>
> >>> Big +1 for the proposal.
> >>>
> >>> You can summarize all of this into a design document. And drive this 
> >>> feature!
> >>>
> >>> Best,
> >>> Vino
> >>>
> >>> Paul Lam <paullin3...@gmail.com> 于2022年3月22日周二 14:40写道：
> >>>>
> >>>> Hi Kent,
> >>>>
> >>>> Thanks for your pointer!
> >>>>
> >>>> TGetQueryIdReq/Resp looks very promising.
> >>>>
> >>>> Best,
> >>>> Paul Lam
> >>>>
> >>>>> 2022年3月21日 12:20，Kent Yao <y...@apache.org> 写道：
> >>>>>
> >>>>>
> >>>>
> >>
>

Re: [DISSCUS][Flink Engine] Flink Savepoint/Checkpoint Management

Reply via email to