Re: [DISCUSS] Flink SQL Syntax for Query/Savepoint Management

Paul Lam Sun, 17 Apr 2022 20:40:00 -0700

Hi Shanghai,

You’re right. We can only retrieve the job names from the cluster, and display 
them as query names.


I agree that the meaning word `QUERY` is kind of ambiguous. Strictly speaking, 
DMLs are not queries, but Hive recognizes DMLs as queries too[1]. 

In general, I think `QUERY` is more SQL-like concept compared to `JOB`,
thus more friendly to SQL users, but I’m okay with `JOB` too. WDYT?

FYI, I’ve drafted the FLIP[2] and I’m starting a new discussion thread soon.

[1] https://issues.apache.org/jira/browse/HIVE-17483 
<https://issues.apache.org/jira/browse/HIVE-17483>
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client

Best,
Paul Lam

> 2022年4月18日 10:39，Shengkai Fang <fskm...@gmail.com> 写道：
> 
> Hi, Paul.
> 
> I am just confused that how the client can retrieve the SQL statement from
> the cluster? The SQL statement has been translated to the jobgraph and
> submit to the cluster.
> 
> I think we will not only manage the query statement lifecyle. How about
> `SHOW JOBS` and it will list the Job ID, Job Name, Job Type(DQL/DML) and
> Status(runnning or failing) ?
> 
> Best,
> Shengkai
> 
> Paul Lam <paullin3...@gmail.com> 于2022年4月12日周二 11:28写道：
> 
>> Hi Jark,
>> 
>> Thanks a lot!
>> 
>> I’m thinking of the 2nd approach. With this approach, the query lifecycle
>> statements
>> (show/stop/savepoint etc) are basically equivalent alternatives to Flink
>> CLI from the
>> user point of view.
>> 
>> BTW, the completed jobs might be missing in `SHOW QUERIES`, because for
>> application/per-clusters modes, the clusters would stop when the job
>> terminates.
>> 
>> WDYT?
>> 
>> Best,
>> Paul Lam
>> 
>>> 2022年4月11日 14:17，Jark Wu <imj...@gmail.com> 写道：
>>> 
>>> Hi Paul, I grant the permission to you.
>>> 
>>> Regarding the "SHOW QUERIES", how will you bookkeep and persist the
>> running
>>> and complete queries?
>>> Or will you retrieve the queries information from the cluster every time
>>> when you receive the command?
>>> 
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> On Wed, 6 Apr 2022 at 11:23, Paul Lam <paullin3...@gmail.com> wrote:
>>> 
>>>> Hi Timo,
>>>> 
>>>> Thanks for you reply!
>>>> 
>>>>> It would be great to further investigate which other commands are
>>>> required that would be usually be exeuted via CLI commands. I would
>> like to
>>>> avoid a large amount of FLIPs each adding a special job lifecycle
>> command.
>>>> 
>>>> Okay. I listed only the commands about jobs/queries that’s required for
>>>> savepoints for simplicity. I would come up with a complete set of
>> commands
>>>> for the full lifecycle of jobs.
>>>> 
>>>>> I guess job lifecycle commands don't make much sense in Table API? Or
>>>> are you planning to support those also TableEnvironment.executeSql and
>>>> integrate them into SQL parser?
>>>> 
>>>> Yes, I’m thinking of adding job lifecycle management in SQL Client. SQL
>>>> client could execute queries via TableEnvironment.executeSql and
>> bookkeep
>>>> the IDs, which is similar to ResultSotre in LocalExecutor.
>>>> 
>>>> BTW, may I ask for the permission on Confluence to create a FLIP?
>>>> 
>>>> Best,
>>>> Paul Lam
>>>> 
>>>>> 2022年4月4日 15:36，Timo Walther <twal...@apache.org> 写道：
>>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> thanks for proposing this. I think in general it makes sense to have
>>>> those commands in SQL Client.
>>>>> 
>>>>> However, this will be a big shift because we start adding job lifecycle
>>>> SQL syntax. It would be great to further investigate which other
>> commands
>>>> are required that would be usually be exeuted via CLI commands. I would
>>>> like to avoid a large amount of FLIPs each adding a special job
>> lifecycle
>>>> command
>>>>> 
>>>>> I guess job lifecycle commands don't make much sense in Table API? Or
>>>> are you planning to support those also TableEnvironment.executeSql and
>>>> integrate them into SQL parser?
>>>>> 
>>>>> Thanks,
>>>>> Timo
>>>>> 
>>>>> 
>>>>> Am 01.04.22 um 12:28 schrieb Paul Lam:
>>>>>> Hi Martjin,
>>>>>> 
>>>>>>> For any extension on the SQL syntax, there should be a FLIP. I would
>>>> like
>>>>>>> to understand how this works for both bounded and unbounded jobs, how
>>>> this
>>>>>>> works with the SQL upgrade story. Could you create one?
>>>>>> Sure. I’m preparing one. Please give me the permission if possible.
>>>>>> 
>>>>>> My Confluence user name is `paulin3280`, and the full name is `Paul
>>>> Lam`.
>>>>>> 
>>>>>>> I'm also copying in @Timo Walther <twal...@apache.org> and @Jark Wu
>>>>>>> <imj...@gmail.com> for their opinion on this.
>>>>>> Looking forward to your opinions @Timo @Jark :)
>>>>>> 
>>>>>> Best,
>>>>>> Paul Lam
>>>>>> 
>>>>>>> 2022年4月1日 18:10，Martijn Visser <martijnvis...@apache.org> 写道：
>>>>>>> 
>>>>>>> Hi Paul,
>>>>>>> 
>>>>>>> For any extension on the SQL syntax, there should be a FLIP. I would
>>>> like
>>>>>>> to understand how this works for both bounded and unbounded jobs, how
>>>> this
>>>>>>> works with the SQL upgrade story. Could you create one?
>>>>>>> 
>>>>>>> I'm also copying in @Timo Walther <twal...@apache.org> and @Jark Wu
>>>>>>> <imj...@gmail.com> for their opinion on this.
>>>>>>> 
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Martijn
>>>>>>> 
>>>>>>> On Fri, 1 Apr 2022 at 12:01, Paul Lam <paullin3...@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Martijn,
>>>>>>>> 
>>>>>>>> Thanks a lot for your input.
>>>>>>>> 
>>>>>>>>> Have you already thought on how you would implement this in Flink?
>>>>>>>> Yes, I roughly thought about the implementation:
>>>>>>>> 
>>>>>>>> 1. Extending Executor to support job list via ClusterClient.
>>>>>>>> 2. Extending Executor to support savepoint trigger/cancel/remove via
>>>>>>>> JobClient.
>>>>>>>> 3. Extending SQL parser to support the new statements via regex
>>>>>>>> (AbstractRegexParseStrategy) or Calcite.
>>>>>>>> 
>>>>>>>> IMHO, the implementation is not very complicated and barely touches
>>>> the
>>>>>>>> architecture of FLIP-91.
>>>>>>>> (BTW,  FLIP-91 might be a little bit outdated and doesn’t fully
>>>> reflect
>>>>>>>> the current status of Flink SQL client/gateway.)
>>>>>>>> 
>>>>>>>> WDYT?
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Paul Lam
>>>>>>>> 
>>>>>>>>> 2022年4月1日 17:33，Martijn Visser <mart...@ververica.com> 写道：
>>>>>>>>> 
>>>>>>>>> Hi Paul,
>>>>>>>>> 
>>>>>>>>> Thanks for opening the discussion. I agree that there are
>>>> opportunities
>>>>>>>> in
>>>>>>>>> this area to increase user value.
>>>>>>>>> 
>>>>>>>>> I would say that the syntax should be part of a proposal in a FLIP,
>>>>>>>> because
>>>>>>>>> the implementation would actually be the complex part, not so much
>>>> the
>>>>>>>>> syntax :) Especially since this also touches on FLIP-91 [1]
>>>>>>>>> 
>>>>>>>>> Have you already thought on how you would implement this in Flink?
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> 
>>>>>>>>> Martijn Visser
>>>>>>>>> https://twitter.com/MartijnVisser82
>>>>>>>>> https://github.com/MartijnVisser
>>>>>>>>> 
>>>>>>>>> [1]
>>>>>>>>> 
>>>>>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>>>>>>> 
>>>>>>>>> On Fri, 1 Apr 2022 at 11:25, Paul Lam <paullin3...@gmail.com>
>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi team,
>>>>>>>>>> 
>>>>>>>>>> Greetings from Apache Kyuubi(incubating) community. We’re
>>>> integrating
>>>>>>>>>> Flink as a SQL engine and aiming to make it production-ready.
>>>>>>>>>> 
>>>>>>>>>> However, query/savepoint management is a crucial but missing part
>> in
>>>>>>>> Flink
>>>>>>>>>> SQL, thus we reach out to discuss the SQL syntax with Flink
>>>> community.
>>>>>>>>>> 
>>>>>>>>>> We propose to introduce the following statements:
>>>>>>>>>> 
>>>>>>>>>> SHOW QUERIES: shows the running queries in the current session,
>>>> which
>>>>>>>>>> mainly returns query(namely Flink job) IDs and SQL statements.
>>>>>>>>>> TRIGGER SAVEPOINT <query_id>: triggers a savepoint for the
>> specified
>>>>>>>>>> query, which returns the stored path of the savepoint.
>>>>>>>>>> SHOW SAVEPOINTS <query_id>: shows the savepoints for the specified
>>>>>>>> query,
>>>>>>>>>> which returns the stored paths of the savepoints.
>>>>>>>>>> REMOVE SAVEPOINT <savepoint_path>: removes the specified
>> savepoint.
>>>>>>>>>> 
>>>>>>>>>> WRT to keywords, `TRIGGER` and `SAVEPOINT` are already reserved
>>>> keywords
>>>>>>>>>> in Flink SQL[1], so the only new keyword is `QUERIES`.
>>>>>>>>>> 
>>>>>>>>>> If we reach a consensus on the syntax, we could either implement
>> it
>>>> in
>>>>>>>>>> Kyuubi and contribute back to Flink, or directly implement it in
>>>> Flink.
>>>>>>>>>> 
>>>>>>>>>> Looking forward for your feedback ;)
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>> 
>>>> 
>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/overview/#reserved-keywords
>>>>>>>>>> Best,
>>>>>>>>>> Paul Lam
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>>

Re: [DISCUSS] Flink SQL Syntax for Query/Savepoint Management

Reply via email to