Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Paul Lam Thu, 09 Jun 2022 00:27:40 -0700

Hi team,

It's great to see our opinions are finally converging!


> `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] `


LGTM. Adding it to the FLIP.

To Jark,

> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>”

Good point. The default savepoint dir should be enough for most cases.

To Jing,

> DROP SAVEPOINT ALL

I think it’s valid to have such a statement, but I have two concerns:
`ALL` is already an SQL keyword, thus it may cause ambiguity.
Flink CLI and REST API doesn’t provided the corresponding functionalities, and 
we’d better keep them aligned.
How about making this statement as follow-up tasks which should touch REST API 
and Flink CLI?

Best,
Paul Lam

> 2022年6月9日 11:53，godfrey he <[email protected]> 写道：
> 
> Hi all,
> 
> Regarding `PIPELINE`, it comes from flink-core module, see
> `PipelineOptions` class for more details.
> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with 
> `JOBS`.
> 
> +1 to discuss JOBTREE in other FLIP.
> 
> +1 to `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] `
> 
> +1 to `CREATE SAVEPOINT FOR JOB <job_id>` and `DROP SAVEPOINT 
> <savepoint_path>`
> 
> Best,
> Godfrey
> 
> Jing Ge <[email protected]> 于2022年6月9日周四 01:48写道：
>> 
>> Hi Paul, Hi Jark,
>> 
>> Re JOBTREE, agree that it is out of the scope of this FLIP
>> 
>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP 
>> SAVEPOINT ALL' housekeeping. WDYT?
>> 
>> Best regards,
>> Jing
>> 
>> 
>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <[email protected]> wrote:
>>> 
>>> Hi Jing,
>>> 
>>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of the 
>>> scope
>>> of this FLIP and can be discussed in another FLIP.
>>> 
>>> Job lineage is a big topic that may involve many problems:
>>> 1) how to collect and report job entities, attributes, and lineages?
>>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>>> 3) how does Flink SQL CLI/Gateway know the lineage information and show 
>>> jobtree?
>>> 4) ...
>>> 
>>> Best,
>>> Jark
>>> 
>>> On Wed, 8 Jun 2022 at 20:44, Jark Wu <[email protected]> wrote:
>>>> 
>>>> Hi Paul,
>>>> 
>>>> I'm fine with using JOBS. The only concern is that this may conflict with 
>>>> displaying more detailed
>>>> information for query (e.g. query content, plan) in the future, e.g. SHOW 
>>>> QUERIES EXTENDED in ksqldb[1].
>>>> This is not a big problem as we can introduce SHOW QUERIES in the future 
>>>> if necessary.
>>>> 
>>>>> STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and 
>>>>> `table.job.stop-with-drain`)
>>>> What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ?
>>>> It might be trivial and error-prone to set configuration before executing 
>>>> a statement,
>>>> and the configuration will affect all statements after that.
>>>> 
>>>>> CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
>>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>",
>>>> and always use configuration "state.savepoints.dir" as the default 
>>>> savepoint dir.
>>>> The concern with using "<savepoint_path>" is here should be savepoint dir,
>>>> and savepoint_path is the returned value.
>>>> 
>>>> I'm fine with other changes.
>>>> 
>>>> Thanks,
>>>> Jark
>>>> 
>>>> [1]: 
>>>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>>> 
>>>> 
>>>> 
>>>> On Wed, 8 Jun 2022 at 15:07, Paul Lam <[email protected]> wrote:
>>>>> 
>>>>> Hi Jing,
>>>>> 
>>>>> Thank you for your inputs!
>>>>> 
>>>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think 
>>>>> they’re managed just like other jobs interns of job lifecycles (please 
>>>>> correct me if I’m wrong).
>>>>> 
>>>>> WRT to the SQL statements about SQL lineages, I think it might be a 
>>>>> little bit out of the scope of the FLIP, since it’s mainly about 
>>>>> lifecycles. By the way, do we have these functionalities in Flink CLI or 
>>>>> REST API already?
>>>>> 
>>>>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the 
>>>>> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m 
>>>>> updating the FLIP arcading to the latest discussions.
>>>>> 
>>>>> Best,
>>>>> Paul Lam
>>>>> 
>>>>> 2022年6月8日 07:31，Jing Ge <[email protected]> 写道：
>>>>> 
>>>>> Hi Paul,
>>>>> 
>>>>> Sorry that I am a little bit too late to join this thread. Thanks for 
>>>>> driving this and starting this informative discussion. The FLIP looks 
>>>>> really interesting. It will help us a lot to manage Flink SQL jobs.
>>>>> 
>>>>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs 
>>>>> build a DAG for many DAGs?
>>>>> 
>>>>> 1)
>>>>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how to 
>>>>> support ETL jobs. Briefly speaking, SQLs that used to build the DAG are 
>>>>> responsible to *produce* data as the result(cube, materialized view, 
>>>>> etc.) for the future consumption by queries. The INSERT INTO SELECT FROM 
>>>>> example in FLIP and CTAS are typical SQL in this case. I would prefer to 
>>>>> call them Jobs instead of Queries.
>>>>> 
>>>>> 2)
>>>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to 
>>>>> support syntax like:
>>>>> 
>>>>> SHOW JOBTREE <job_id>  // shows the downstream DAG from the given job_id
>>>>> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the given 
>>>>> job_id
>>>>> SHOW JOBTREES // shows all DAGs
>>>>> SHOW ANCIENTS <job_id> // shows all parents of the given job_id
>>>>> 
>>>>> 3)
>>>>> Could we also support Savepoint housekeeping syntax? We ran into this 
>>>>> issue that a lot of savepoints have been created by customers (via their 
>>>>> apps). It will take extra (hacking) effort to clean it.
>>>>> 
>>>>> RELEASE SAVEPOINT ALL
>>>>> 
>>>>> Best regards,
>>>>> Jing
>>>>> 
>>>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <[email protected]> 
>>>>> wrote:
>>>>>> 
>>>>>> Hi Paul,
>>>>>> 
>>>>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES 
>>>>>> could
>>>>>> imply that this will actually show the query, but we're returning IDs of
>>>>>> the running application. At first I was also not very much in favour of
>>>>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
>>>>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS
>>>>>> 
>>>>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
>>>>>> 
>>>>>> Best regards,
>>>>>> 
>>>>>> Martijn
>>>>>> 
>>>>>> [1]
>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary
>>>>>> 
>>>>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <[email protected]>:
>>>>>> 
>>>>>>> Hi Godfrey,
>>>>>>> 
>>>>>>> Sorry for the late reply, I was on vacation.
>>>>>>> 
>>>>>>> It looks like we have a variety of preferences on the syntax, how about 
>>>>>>> we
>>>>>>> choose the most acceptable one?
>>>>>>> 
>>>>>>> WRT keyword for SQL jobs, we use JOBS, thus the statements related to 
>>>>>>> jobs
>>>>>>> would be:
>>>>>>> 
>>>>>>> - SHOW JOBS
>>>>>>> - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and
>>>>>>> `table.job.stop-with-drain`)
>>>>>>> 
>>>>>>> WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
>>>>>>> JOB`:
>>>>>>> 
>>>>>>> - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
>>>>>>> - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job
>>>>>>> manager remembers)
>>>>>>> - DROP SAVEPOINT <savepoint_path>
>>>>>>> 
>>>>>>> cc @Jark @ShengKai @Martijn @Timo .
>>>>>>> 
>>>>>>> Best,
>>>>>>> Paul Lam
>>>>>>> 
>>>>>>> 
>>>>>>> godfrey he <[email protected]> 于2022年5月23日周一 21:34写道：
>>>>>>> 
>>>>>>>> Hi Paul,
>>>>>>>> 
>>>>>>>> Thanks for the update.
>>>>>>>> 
>>>>>>>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>>>>>> (DataStream or SQL) or
>>>>>>>> clients (SQL client or CLI).
>>>>>>>> 
>>>>>>>> Is DataStream job a QUERY? I think not.
>>>>>>>> For a QUERY, the most important concept is the statement. But the
>>>>>>>> result does not contain this info.
>>>>>>>> If we need to contain all jobs in the cluster, I think the name should
>>>>>>>> be JOB or PIPELINE.
>>>>>>>> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.
>>>>>>>> 
>>>>>>>>> SHOW SAVEPOINTS
>>>>>>>> To list the savepoint for a specific job, we need to specify a
>>>>>>>> specific pipeline,
>>>>>>>> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Godfrey
>>>>>>>> 
>>>>>>>> Paul Lam <[email protected]> 于2022年5月20日周五 11:25写道：
>>>>>>>>> 
>>>>>>>>> Hi Jark,
>>>>>>>>> 
>>>>>>>>> WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s
>>>>>>>>> part of the reason why I proposed “STOP/CANCEL QUERY” at the
>>>>>>>>> beginning. The downside of it is that it’s not ANSI-SQL compatible.
>>>>>>>>> 
>>>>>>>>> Another question is, what should be the syntax for ungracefully
>>>>>>>>> canceling a query? As ShengKai pointed out in a offline discussion,
>>>>>>>>> “STOP QUERY” and “CANCEL QUERY” might confuse SQL users.
>>>>>>>>> Flink CLI has both stop and cancel, mostly due to historical problems.
>>>>>>>>> 
>>>>>>>>> WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is
>>>>>>>>> that savepoints are owned by users and beyond the lifecycle of a Flink
>>>>>>>>> cluster. For example, a user might take a savepoint at a custom path
>>>>>>>>> that’s different than the default savepoint path, I think jobmanager
>>>>>>>> would
>>>>>>>>> not remember that, not to mention the jobmanager may be a fresh new
>>>>>>>>> one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, it's
>>>>>>>>> probably a best-effort one.
>>>>>>>>> 
>>>>>>>>> WRT savepoint syntax, I’m thinking of the semantic of the savepoint.
>>>>>>>>> Savepoints are alias for nested transactions in DB area[1], and 
>>>>>>>>> there’s
>>>>>>>>> correspondingly global transactions. If we consider Flink jobs as
>>>>>>>>> global transactions and Flink checkpoints as nested transactions,
>>>>>>>>> then the savepoint semantics are close, thus I think savepoint syntax
>>>>>>>>> in SQL-standard could be considered. But again, I’m don’t have very
>>>>>>>>> strong preference.
>>>>>>>>> 
>>>>>>>>> Ping @Timo to get more inputs.
>>>>>>>>> 
>>>>>>>>> [1] https://en.wikipedia.org/wiki/Nested_transaction <
>>>>>>>> https://en.wikipedia.org/wiki/Nested_transaction>
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> Paul Lam
>>>>>>>>> 
>>>>>>>>>> 2022年5月18日 17:48，Jark Wu <[email protected]> 写道：
>>>>>>>>>> 
>>>>>>>>>> Hi Paul,
>>>>>>>>>> 
>>>>>>>>>> 1) SHOW QUERIES
>>>>>>>>>> +1 to add finished time, but it would be better to call it "end_time"
>>>>>>>> to
>>>>>>>>>> keep aligned with names in Web UI.
>>>>>>>>>> 
>>>>>>>>>> 2) DROP QUERY
>>>>>>>>>> I think we shouldn't throw exceptions for batch jobs, otherwise, how
>>>>>>>> to
>>>>>>>>>> stop batch queries?
>>>>>>>>>> At present, I don't think "DROP" is a suitable keyword for this
>>>>>>>> statement.
>>>>>>>>>> From the perspective of users, "DROP" sounds like the query should be
>>>>>>>>>> removed from the
>>>>>>>>>> list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is
>>>>>>>> more
>>>>>>>>>> suitable and
>>>>>>>>>> compliant with commands of Flink CLI.
>>>>>>>>>> 
>>>>>>>>>> 3) SHOW SAVEPOINTS
>>>>>>>>>> I think this statement is needed, otherwise, savepoints are lost
>>>>>>>> after the
>>>>>>>>>> SAVEPOINT
>>>>>>>>>> command is executed. Savepoints can be retrieved from REST API
>>>>>>>>>> "/jobs/:jobid/checkpoints"
>>>>>>>>>> with filtering "checkpoint_type"="savepoint". It's also worth
>>>>>>>> considering
>>>>>>>>>> providing "SHOW CHECKPOINTS"
>>>>>>>>>> to list all checkpoints.
>>>>>>>>>> 
>>>>>>>>>> 4) SAVEPOINT & RELEASE SAVEPOINT
>>>>>>>>>> I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT
>>>>>>>> statements
>>>>>>>>>> now.
>>>>>>>>>> In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are
>>>>>>>> both
>>>>>>>>>> the same savepoint id.
>>>>>>>>>> However, in our syntax, the first one is query id, and the second one
>>>>>>>> is
>>>>>>>>>> savepoint path, which is confusing and
>>>>>>>>>> not consistent. When I came across SHOW SAVEPOINT, I thought maybe
>>>>>>>> they
>>>>>>>>>> should be in the same syntax set.
>>>>>>>>>> For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP SAVEPOINT
>>>>>>>>>> <sp_path>.
>>>>>>>>>> That means we don't follow the majority of vendors in SAVEPOINT
>>>>>>>> commands. I
>>>>>>>>>> would say the purpose is different in Flink.
>>>>>>>>>> What other's opinion on this?
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Jark
>>>>>>>>>> 
>>>>>>>>>> [1]:
>>>>>>>>>> 
>>>>>>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, 18 May 2022 at 14:43, Paul Lam <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi Godfrey,
>>>>>>>>>>> 
>>>>>>>>>>> Thanks a lot for your inputs!
>>>>>>>>>>> 
>>>>>>>>>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>>>>>> (DataStream
>>>>>>>>>>> or SQL) or
>>>>>>>>>>> clients (SQL client or CLI). Under the hook, it’s based on
>>>>>>>>>>> ClusterClient#listJobs, the
>>>>>>>>>>> same with Flink CLI. I think it’s okay to have non-SQL jobs listed
>>>>>>>> in SQL
>>>>>>>>>>> client, because
>>>>>>>>>>> these jobs can be managed via SQL client too.
>>>>>>>>>>> 
>>>>>>>>>>> WRT finished time, I think you’re right. Adding it to the FLIP. But
>>>>>>>> I’m a
>>>>>>>>>>> bit afraid that the
>>>>>>>>>>> rows would be too long.
>>>>>>>>>>> 
>>>>>>>>>>> WRT ‘DROP QUERY’,
>>>>>>>>>>>> What's the behavior for batch jobs and the non-running jobs?
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> In general, the behavior would be aligned with Flink CLI. Triggering
>>>>>>>> a
>>>>>>>>>>> savepoint for
>>>>>>>>>>> a non-running job would cause errors, and the error message would be
>>>>>>>>>>> printed to
>>>>>>>>>>> the SQL client. Triggering a savepoint for batch(unbounded) jobs in
>>>>>>>>>>> streaming
>>>>>>>>>>> execution mode would be the same with streaming jobs. However, for
>>>>>>>> batch
>>>>>>>>>>> jobs in
>>>>>>>>>>> batch execution mode, I think there would be an error, because batch
>>>>>>>>>>> execution
>>>>>>>>>>> doesn’t support checkpoints currently (please correct me if I’m
>>>>>>>> wrong).
>>>>>>>>>>> 
>>>>>>>>>>> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink
>>>>>>>> clusterClient/
>>>>>>>>>>> jobClient doesn’t have such a functionality at the moment, neither 
>>>>>>>>>>> do
>>>>>>>>>>> Flink CLI.
>>>>>>>>>>> Maybe we could make it a follow-up FLIP, which includes the
>>>>>>>> modifications
>>>>>>>>>>> to
>>>>>>>>>>> clusterClient/jobClient and Flink CLI. WDYT?
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Paul Lam
>>>>>>>>>>> 
>>>>>>>>>>>> 2022年5月17日 20:34，godfrey he <[email protected]> 写道：
>>>>>>>>>>>> 
>>>>>>>>>>>> Godfrey
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>>>>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to