Re: [DISCUSS] Flink SQL Syntax for Query/Savepoint Management

Paul Lam Sun, 17 Apr 2022 23:26:28 -0700

Sorry for misspelling your name, Shengkai. The autocomple plugin is not very 
wise.


Best,
Paul Lam

> 2022年4月18日 11:39，Paul Lam <paullin3...@gmail.com> 写道：
> 
> Hi Shanghai,
> 
> You’re right. We can only retrieve the job names from the cluster, and 
> display 
> them as query names.
> 
> I agree that the meaning word `QUERY` is kind of ambiguous. Strictly 
> speaking, 
> DMLs are not queries, but Hive recognizes DMLs as queries too[1]. 
> 
> In general, I think `QUERY` is more SQL-like concept compared to `JOB`,
> thus more friendly to SQL users, but I’m okay with `JOB` too. WDYT?
> 
> FYI, I’ve drafted the FLIP[2] and I’m starting a new discussion thread soon.
> 
> [1] https://issues.apache.org/jira/browse/HIVE-17483 
> <https://issues.apache.org/jira/browse/HIVE-17483>
> [2] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client
>  
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client>
> 
> Best,
> Paul Lam
> 
>> 2022年4月18日 10:39，Shengkai Fang <fskm...@gmail.com 
>> <mailto:fskm...@gmail.com>> 写道：
>> 
>> Hi, Paul.
>> 
>> I am just confused that how the client can retrieve the SQL statement from
>> the cluster? The SQL statement has been translated to the jobgraph and
>> submit to the cluster.
>> 
>> I think we will not only manage the query statement lifecyle. How about
>> `SHOW JOBS` and it will list the Job ID, Job Name, Job Type(DQL/DML) and
>> Status(runnning or failing) ?
>> 
>> Best,
>> Shengkai
>> 
>> Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>> 
>> 于2022年4月12日周二 11:28写道：
>> 
>>> Hi Jark,
>>> 
>>> Thanks a lot!
>>> 
>>> I’m thinking of the 2nd approach. With this approach, the query lifecycle
>>> statements
>>> (show/stop/savepoint etc) are basically equivalent alternatives to Flink
>>> CLI from the
>>> user point of view.
>>> 
>>> BTW, the completed jobs might be missing in `SHOW QUERIES`, because for
>>> application/per-clusters modes, the clusters would stop when the job
>>> terminates.
>>> 
>>> WDYT?
>>> 
>>> Best,
>>> Paul Lam
>>> 
>>>> 2022年4月11日 14:17，Jark Wu <imj...@gmail.com <mailto:imj...@gmail.com>> 写道：
>>>> 
>>>> Hi Paul, I grant the permission to you.
>>>> 
>>>> Regarding the "SHOW QUERIES", how will you bookkeep and persist the
>>> running
>>>> and complete queries?
>>>> Or will you retrieve the queries information from the cluster every time
>>>> when you receive the command?
>>>> 
>>>> 
>>>> Best,
>>>> Jark
>>>> 
>>>> 
>>>> On Wed, 6 Apr 2022 at 11:23, Paul Lam <paullin3...@gmail.com 
>>>> <mailto:paullin3...@gmail.com>> wrote:
>>>> 
>>>>> Hi Timo,
>>>>> 
>>>>> Thanks for you reply!
>>>>> 
>>>>>> It would be great to further investigate which other commands are
>>>>> required that would be usually be exeuted via CLI commands. I would
>>> like to
>>>>> avoid a large amount of FLIPs each adding a special job lifecycle
>>> command.
>>>>> 
>>>>> Okay. I listed only the commands about jobs/queries that’s required for
>>>>> savepoints for simplicity. I would come up with a complete set of
>>> commands
>>>>> for the full lifecycle of jobs.
>>>>> 
>>>>>> I guess job lifecycle commands don't make much sense in Table API? Or
>>>>> are you planning to support those also TableEnvironment.executeSql and
>>>>> integrate them into SQL parser?
>>>>> 
>>>>> Yes, I’m thinking of adding job lifecycle management in SQL Client. SQL
>>>>> client could execute queries via TableEnvironment.executeSql and
>>> bookkeep
>>>>> the IDs, which is similar to ResultSotre in LocalExecutor.
>>>>> 
>>>>> BTW, may I ask for the permission on Confluence to create a FLIP?
>>>>> 
>>>>> Best,
>>>>> Paul Lam
>>>>> 
>>>>>> 2022年4月4日 15:36，Timo Walther <twal...@apache.org 
>>>>>> <mailto:twal...@apache.org>> 写道：
>>>>>> 
>>>>>> Hi Paul,
>>>>>> 
>>>>>> thanks for proposing this. I think in general it makes sense to have
>>>>> those commands in SQL Client.
>>>>>> 
>>>>>> However, this will be a big shift because we start adding job lifecycle
>>>>> SQL syntax. It would be great to further investigate which other
>>> commands
>>>>> are required that would be usually be exeuted via CLI commands. I would
>>>>> like to avoid a large amount of FLIPs each adding a special job
>>> lifecycle
>>>>> command
>>>>>> 
>>>>>> I guess job lifecycle commands don't make much sense in Table API? Or
>>>>> are you planning to support those also TableEnvironment.executeSql and
>>>>> integrate them into SQL parser?
>>>>>> 
>>>>>> Thanks,
>>>>>> Timo
>>>>>> 
>>>>>> 
>>>>>> Am 01.04.22 um 12:28 schrieb Paul Lam:
>>>>>>> Hi Martjin,
>>>>>>> 
>>>>>>>> For any extension on the SQL syntax, there should be a FLIP. I would
>>>>> like
>>>>>>>> to understand how this works for both bounded and unbounded jobs, how
>>>>> this
>>>>>>>> works with the SQL upgrade story. Could you create one?
>>>>>>> Sure. I’m preparing one. Please give me the permission if possible.
>>>>>>> 
>>>>>>> My Confluence user name is `paulin3280`, and the full name is `Paul
>>>>> Lam`.
>>>>>>> 
>>>>>>>> I'm also copying in @Timo Walther <twal...@apache.org 
>>>>>>>> <mailto:twal...@apache.org>> and @Jark Wu
>>>>>>>> <imj...@gmail.com <mailto:imj...@gmail.com>> for their opinion on this.
>>>>>>> Looking forward to your opinions @Timo @Jark :)
>>>>>>> 
>>>>>>> Best,
>>>>>>> Paul Lam
>>>>>>> 
>>>>>>>> 2022年4月1日 18:10，Martijn Visser <martijnvis...@apache.org 
>>>>>>>> <mailto:martijnvis...@apache.org>> 写道：
>>>>>>>> 
>>>>>>>> Hi Paul,
>>>>>>>> 
>>>>>>>> For any extension on the SQL syntax, there should be a FLIP. I would
>>>>> like
>>>>>>>> to understand how this works for both bounded and unbounded jobs, how
>>>>> this
>>>>>>>> works with the SQL upgrade story. Could you create one?
>>>>>>>> 
>>>>>>>> I'm also copying in @Timo Walther <twal...@apache.org 
>>>>>>>> <mailto:twal...@apache.org>> and @Jark Wu
>>>>>>>> <imj...@gmail.com <mailto:imj...@gmail.com>> for their opinion on this.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> 
>>>>>>>> Martijn
>>>>>>>> 
>>>>>>>> On Fri, 1 Apr 2022 at 12:01, Paul Lam <paullin3...@gmail.com 
>>>>>>>> <mailto:paullin3...@gmail.com>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Martijn,
>>>>>>>>> 
>>>>>>>>> Thanks a lot for your input.
>>>>>>>>> 
>>>>>>>>>> Have you already thought on how you would implement this in Flink?
>>>>>>>>> Yes, I roughly thought about the implementation:
>>>>>>>>> 
>>>>>>>>> 1. Extending Executor to support job list via ClusterClient.
>>>>>>>>> 2. Extending Executor to support savepoint trigger/cancel/remove via
>>>>>>>>> JobClient.
>>>>>>>>> 3. Extending SQL parser to support the new statements via regex
>>>>>>>>> (AbstractRegexParseStrategy) or Calcite.
>>>>>>>>> 
>>>>>>>>> IMHO, the implementation is not very complicated and barely touches
>>>>> the
>>>>>>>>> architecture of FLIP-91.
>>>>>>>>> (BTW,  FLIP-91 might be a little bit outdated and doesn’t fully
>>>>> reflect
>>>>>>>>> the current status of Flink SQL client/gateway.)
>>>>>>>>> 
>>>>>>>>> WDYT?
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Paul Lam
>>>>>>>>> 
>>>>>>>>>> 2022年4月1日 17:33，Martijn Visser <mart...@ververica.com 
>>>>>>>>>> <mailto:mart...@ververica.com>> 写道：
>>>>>>>>>> 
>>>>>>>>>> Hi Paul,
>>>>>>>>>> 
>>>>>>>>>> Thanks for opening the discussion. I agree that there are
>>>>> opportunities
>>>>>>>>> in
>>>>>>>>>> this area to increase user value.
>>>>>>>>>> 
>>>>>>>>>> I would say that the syntax should be part of a proposal in a FLIP,
>>>>>>>>> because
>>>>>>>>>> the implementation would actually be the complex part, not so much
>>>>> the
>>>>>>>>>> syntax :) Especially since this also touches on FLIP-91 [1]
>>>>>>>>>> 
>>>>>>>>>> Have you already thought on how you would implement this in Flink?
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> 
>>>>>>>>>> Martijn Visser
>>>>>>>>>> https://twitter.com/MartijnVisser82 
>>>>>>>>>> <https://twitter.com/MartijnVisser82>
>>>>>>>>>> https://github.com/MartijnVisser
>>>>>>>>>> 
>>>>>>>>>> [1]
>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
>>>  
>>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway>
>>>>>>>>>> 
>>>>>>>>>> On Fri, 1 Apr 2022 at 11:25, Paul Lam <paullin3...@gmail.com>
>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi team,
>>>>>>>>>>> 
>>>>>>>>>>> Greetings from Apache Kyuubi(incubating) community. We’re
>>>>> integrating
>>>>>>>>>>> Flink as a SQL engine and aiming to make it production-ready.
>>>>>>>>>>> 
>>>>>>>>>>> However, query/savepoint management is a crucial but missing part
>>> in
>>>>>>>>> Flink
>>>>>>>>>>> SQL, thus we reach out to discuss the SQL syntax with Flink
>>>>> community.
>>>>>>>>>>> 
>>>>>>>>>>> We propose to introduce the following statements:
>>>>>>>>>>> 
>>>>>>>>>>> SHOW QUERIES: shows the running queries in the current session,
>>>>> which
>>>>>>>>>>> mainly returns query(namely Flink job) IDs and SQL statements.
>>>>>>>>>>> TRIGGER SAVEPOINT <query_id>: triggers a savepoint for the
>>> specified
>>>>>>>>>>> query, which returns the stored path of the savepoint.
>>>>>>>>>>> SHOW SAVEPOINTS <query_id>: shows the savepoints for the specified
>>>>>>>>> query,
>>>>>>>>>>> which returns the stored paths of the savepoints.
>>>>>>>>>>> REMOVE SAVEPOINT <savepoint_path>: removes the specified
>>> savepoint.
>>>>>>>>>>> 
>>>>>>>>>>> WRT to keywords, `TRIGGER` and `SAVEPOINT` are already reserved
>>>>> keywords
>>>>>>>>>>> in Flink SQL[1], so the only new keyword is `QUERIES`.
>>>>>>>>>>> 
>>>>>>>>>>> If we reach a consensus on the syntax, we could either implement
>>> it
>>>>> in
>>>>>>>>>>> Kyuubi and contribute back to Flink, or directly implement it in
>>>>> Flink.
>>>>>>>>>>> 
>>>>>>>>>>> Looking forward for your feedback ;)
>>>>>>>>>>> 
>>>>>>>>>>> [1]
>>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/overview/#reserved-keywords
>>>>>>>>>>> Best,
>>>>>>>>>>> Paul Lam
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>>> 
>

Re: [DISCUSS] Flink SQL Syntax for Query/Savepoint Management

Reply via email to