Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Jing Ge Thu, 09 Jun 2022 11:26:58 -0700

Hi Paul,

Fired a ticket: https://issues.apache.org/jira/browse/FLINK-27977 for
savepoints housekeeping.


Best regards,
Jing

On Thu, Jun 9, 2022 at 10:37 AM Martijn Visser <martijnvis...@apache.org>
wrote:

> Hi Paul,
>
> That's a fair point, but I still think we should not offer that capability
> via the CLI either. But that's a different discussion :)
>
> Thanks,
>
> Martijn
>
> Op do 9 jun. 2022 om 10:08 schreef Paul Lam <paullin3...@gmail.com>:
>
>> Hi Martijn,
>>
>> I think the `DROP SAVEPOINT` statement would not conflict with NO_CLAIM
>> mode, since the statement is triggered by users instead of Flink runtime.
>>
>> We’re simply providing a tool for user to cleanup the savepoints, just
>> like `bin/flink savepoint -d :savepointPath` in Flink CLI [1].
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/ops/state/savepoints/#disposing-savepoints
>>
>> Best,
>> Paul Lam
>>
>> 2022年6月9日 15:41，Martijn Visser <martijnvis...@apache.org> 写道：
>>
>> Hi all,
>>
>> I would not include a DROP SAVEPOINT syntax. With the recently introduced
>> CLAIM/NO CLAIM mode, I would argue that we've just clarified snapshot
>> ownership and if you have a savepoint established "with NO_CLAIM it creates
>> its own copy and leaves the existing one up to the user." [1] We shouldn't
>> then again make it fuzzy by making it possible that Flink can remove
>> snapshots.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-193%3A+Snapshots+ownership
>>
>> Op do 9 jun. 2022 om 09:27 schreef Paul Lam <paullin3...@gmail.com>:
>>
>>> Hi team,
>>>
>>> It's great to see our opinions are finally converging!
>>>
>>> `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] `
>>>
>>>
>>> LGTM. Adding it to the FLIP.
>>>
>>> To Jark,
>>>
>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>”
>>>
>>>
>>> Good point. The default savepoint dir should be enough for most cases.
>>>
>>> To Jing,
>>>
>>> DROP SAVEPOINT ALL
>>>
>>>
>>> I think it’s valid to have such a statement, but I have two concerns:
>>>
>>>    - `ALL` is already an SQL keyword, thus it may cause ambiguity.
>>>    - Flink CLI and REST API doesn’t provided the corresponding
>>>    functionalities, and we’d better keep them aligned.
>>>
>>> How about making this statement as follow-up tasks which should touch
>>> REST API and Flink CLI?
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年6月9日 11:53，godfrey he <godfre...@gmail.com> 写道：
>>>
>>> Hi all,
>>>
>>> Regarding `PIPELINE`, it comes from flink-core module, see
>>> `PipelineOptions` class for more details.
>>> `JOBS` is a more generic concept than `PIPELINES`. I'm also be fine with
>>> `JOBS`.
>>>
>>> +1 to discuss JOBTREE in other FLIP.
>>>
>>> +1 to `STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] `
>>>
>>> +1 to `CREATE SAVEPOINT FOR JOB <job_id>` and `DROP SAVEPOINT
>>> <savepoint_path>`
>>>
>>> Best,
>>> Godfrey
>>>
>>> Jing Ge <j...@ververica.com> 于2022年6月9日周四 01:48写道：
>>>
>>>
>>> Hi Paul, Hi Jark,
>>>
>>> Re JOBTREE, agree that it is out of the scope of this FLIP
>>>
>>> Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP
>>> SAVEPOINT ALL' housekeeping. WDYT?
>>>
>>> Best regards,
>>> Jing
>>>
>>>
>>> On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <imj...@gmail.com> wrote:
>>>
>>>
>>> Hi Jing,
>>>
>>> Regarding JOBTREE (job lineage), I agree with Paul that this is out of
>>> the scope
>>> of this FLIP and can be discussed in another FLIP.
>>>
>>> Job lineage is a big topic that may involve many problems:
>>> 1) how to collect and report job entities, attributes, and lineages?
>>> 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub?
>>> 3) how does Flink SQL CLI/Gateway know the lineage information and show
>>> jobtree?
>>> 4) ...
>>>
>>> Best,
>>> Jark
>>>
>>> On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com> wrote:
>>>
>>>
>>> Hi Paul,
>>>
>>> I'm fine with using JOBS. The only concern is that this may conflict
>>> with displaying more detailed
>>> information for query (e.g. query content, plan) in the future, e.g.
>>> SHOW QUERIES EXTENDED in ksqldb[1].
>>> This is not a big problem as we can introduce SHOW QUERIES in the future
>>> if necessary.
>>>
>>> STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and
>>> `table.job.stop-with-drain`)
>>>
>>> What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ?
>>> It might be trivial and error-prone to set configuration before
>>> executing a statement,
>>> and the configuration will affect all statements after that.
>>>
>>> CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
>>>
>>> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>",
>>> and always use configuration "state.savepoints.dir" as the default
>>> savepoint dir.
>>> The concern with using "<savepoint_path>" is here should be savepoint
>>> dir,
>>> and savepoint_path is the returned value.
>>>
>>> I'm fine with other changes.
>>>
>>> Thanks,
>>> Jark
>>>
>>> [1]:
>>> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
>>>
>>>
>>>
>>> On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com> wrote:
>>>
>>>
>>> Hi Jing,
>>>
>>> Thank you for your inputs!
>>>
>>> TBH, I haven’t considered the ETL scenario that you mentioned. I think
>>> they’re managed just like other jobs interns of job lifecycles (please
>>> correct me if I’m wrong).
>>>
>>> WRT to the SQL statements about SQL lineages, I think it might be a
>>> little bit out of the scope of the FLIP, since it’s mainly about
>>> lifecycles. By the way, do we have these functionalities in Flink CLI or
>>> REST API already?
>>>
>>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the
>>> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m
>>> updating the FLIP arcading to the latest discussions.
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年6月8日 07:31，Jing Ge <j...@ververica.com> 写道：
>>>
>>> Hi Paul,
>>>
>>> Sorry that I am a little bit too late to join this thread. Thanks for
>>> driving this and starting this informative discussion. The FLIP looks
>>> really interesting. It will help us a lot to manage Flink SQL jobs.
>>>
>>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs
>>> build a DAG for many DAGs?
>>>
>>> 1)
>>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how
>>> to support ETL jobs. Briefly speaking, SQLs that used to build the DAG are
>>> responsible to *produce* data as the result(cube, materialized view, etc.)
>>> for the future consumption by queries. The INSERT INTO SELECT FROM example
>>> in FLIP and CTAS are typical SQL in this case. I would prefer to call them
>>> Jobs instead of Queries.
>>>
>>> 2)
>>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to
>>> support syntax like:
>>>
>>> SHOW JOBTREE <job_id>  // shows the downstream DAG from the given job_id
>>> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the
>>> given job_id
>>> SHOW JOBTREES // shows all DAGs
>>> SHOW ANCIENTS <job_id> // shows all parents of the given job_id
>>>
>>> 3)
>>> Could we also support Savepoint housekeeping syntax? We ran into this
>>> issue that a lot of savepoints have been created by customers (via their
>>> apps). It will take extra (hacking) effort to clean it.
>>>
>>> RELEASE SAVEPOINT ALL
>>>
>>> Best regards,
>>> Jing
>>>
>>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org>
>>> wrote:
>>>
>>>
>>> Hi Paul,
>>>
>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES
>>> could
>>> imply that this will actually show the query, but we're returning IDs of
>>> the running application. At first I was also not very much in favour of
>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink
>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP JOBS
>>>
>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1]
>>>
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary
>>>
>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com>:
>>>
>>> Hi Godfrey,
>>>
>>> Sorry for the late reply, I was on vacation.
>>>
>>> It looks like we have a variety of preferences on the syntax, how about
>>> we
>>> choose the most acceptable one?
>>>
>>> WRT keyword for SQL jobs, we use JOBS, thus the statements related to
>>> jobs
>>> would be:
>>>
>>> - SHOW JOBS
>>> - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and
>>> `table.job.stop-with-drain`)
>>>
>>> WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR
>>> JOB`:
>>>
>>> - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id>
>>> - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job
>>> manager remembers)
>>> - DROP SAVEPOINT <savepoint_path>
>>>
>>> cc @Jark @ShengKai @Martijn @Timo .
>>>
>>> Best,
>>> Paul Lam
>>>
>>>
>>> godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道：
>>>
>>> Hi Paul,
>>>
>>> Thanks for the update.
>>>
>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>
>>> (DataStream or SQL) or
>>> clients (SQL client or CLI).
>>>
>>> Is DataStream job a QUERY? I think not.
>>> For a QUERY, the most important concept is the statement. But the
>>> result does not contain this info.
>>> If we need to contain all jobs in the cluster, I think the name should
>>> be JOB or PIPELINE.
>>> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id.
>>>
>>> SHOW SAVEPOINTS
>>>
>>> To list the savepoint for a specific job, we need to specify a
>>> specific pipeline,
>>> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id
>>>
>>> Best,
>>> Godfrey
>>>
>>> Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道：
>>>
>>>
>>> Hi Jark,
>>>
>>> WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s
>>> part of the reason why I proposed “STOP/CANCEL QUERY” at the
>>> beginning. The downside of it is that it’s not ANSI-SQL compatible.
>>>
>>> Another question is, what should be the syntax for ungracefully
>>> canceling a query? As ShengKai pointed out in a offline discussion,
>>> “STOP QUERY” and “CANCEL QUERY” might confuse SQL users.
>>> Flink CLI has both stop and cancel, mostly due to historical problems.
>>>
>>> WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is
>>> that savepoints are owned by users and beyond the lifecycle of a Flink
>>> cluster. For example, a user might take a savepoint at a custom path
>>> that’s different than the default savepoint path, I think jobmanager
>>>
>>> would
>>>
>>> not remember that, not to mention the jobmanager may be a fresh new
>>> one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, it's
>>> probably a best-effort one.
>>>
>>> WRT savepoint syntax, I’m thinking of the semantic of the savepoint.
>>> Savepoints are alias for nested transactions in DB area[1], and there’s
>>> correspondingly global transactions. If we consider Flink jobs as
>>> global transactions and Flink checkpoints as nested transactions,
>>> then the savepoint semantics are close, thus I think savepoint syntax
>>> in SQL-standard could be considered. But again, I’m don’t have very
>>> strong preference.
>>>
>>> Ping @Timo to get more inputs.
>>>
>>> [1] https://en.wikipedia.org/wiki/Nested_transaction <
>>>
>>> https://en.wikipedia.org/wiki/Nested_transaction>
>>>
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年5月18日 17:48，Jark Wu <imj...@gmail.com> 写道：
>>>
>>> Hi Paul,
>>>
>>> 1) SHOW QUERIES
>>> +1 to add finished time, but it would be better to call it "end_time"
>>>
>>> to
>>>
>>> keep aligned with names in Web UI.
>>>
>>> 2) DROP QUERY
>>> I think we shouldn't throw exceptions for batch jobs, otherwise, how
>>>
>>> to
>>>
>>> stop batch queries?
>>> At present, I don't think "DROP" is a suitable keyword for this
>>>
>>> statement.
>>>
>>> From the perspective of users, "DROP" sounds like the query should be
>>> removed from the
>>> list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" is
>>>
>>> more
>>>
>>> suitable and
>>> compliant with commands of Flink CLI.
>>>
>>> 3) SHOW SAVEPOINTS
>>> I think this statement is needed, otherwise, savepoints are lost
>>>
>>> after the
>>>
>>> SAVEPOINT
>>> command is executed. Savepoints can be retrieved from REST API
>>> "/jobs/:jobid/checkpoints"
>>> with filtering "checkpoint_type"="savepoint". It's also worth
>>>
>>> considering
>>>
>>> providing "SHOW CHECKPOINTS"
>>> to list all checkpoints.
>>>
>>> 4) SAVEPOINT & RELEASE SAVEPOINT
>>> I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT
>>>
>>> statements
>>>
>>> now.
>>> In the vendors, the parameters of SAVEPOINT and RELEASE SAVEPOINT are
>>>
>>> both
>>>
>>> the same savepoint id.
>>> However, in our syntax, the first one is query id, and the second one
>>>
>>> is
>>>
>>> savepoint path, which is confusing and
>>> not consistent. When I came across SHOW SAVEPOINT, I thought maybe
>>>
>>> they
>>>
>>> should be in the same syntax set.
>>> For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP SAVEPOINT
>>> <sp_path>.
>>> That means we don't follow the majority of vendors in SAVEPOINT
>>>
>>> commands. I
>>>
>>> would say the purpose is different in Flink.
>>> What other's opinion on this?
>>>
>>> Best,
>>> Jark
>>>
>>> [1]:
>>>
>>>
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints
>>>
>>>
>>>
>>> On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> wrote:
>>>
>>> Hi Godfrey,
>>>
>>> Thanks a lot for your inputs!
>>>
>>> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs
>>>
>>> (DataStream
>>>
>>> or SQL) or
>>> clients (SQL client or CLI). Under the hook, it’s based on
>>> ClusterClient#listJobs, the
>>> same with Flink CLI. I think it’s okay to have non-SQL jobs listed
>>>
>>> in SQL
>>>
>>> client, because
>>> these jobs can be managed via SQL client too.
>>>
>>> WRT finished time, I think you’re right. Adding it to the FLIP. But
>>>
>>> I’m a
>>>
>>> bit afraid that the
>>> rows would be too long.
>>>
>>> WRT ‘DROP QUERY’,
>>>
>>> What's the behavior for batch jobs and the non-running jobs?
>>>
>>>
>>>
>>> In general, the behavior would be aligned with Flink CLI. Triggering
>>>
>>> a
>>>
>>> savepoint for
>>> a non-running job would cause errors, and the error message would be
>>> printed to
>>> the SQL client. Triggering a savepoint for batch(unbounded) jobs in
>>> streaming
>>> execution mode would be the same with streaming jobs. However, for
>>>
>>> batch
>>>
>>> jobs in
>>> batch execution mode, I think there would be an error, because batch
>>> execution
>>> doesn’t support checkpoints currently (please correct me if I’m
>>>
>>> wrong).
>>>
>>>
>>> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink
>>>
>>> clusterClient/
>>>
>>> jobClient doesn’t have such a functionality at the moment, neither do
>>> Flink CLI.
>>> Maybe we could make it a follow-up FLIP, which includes the
>>>
>>> modifications
>>>
>>> to
>>> clusterClient/jobClient and Flink CLI. WDYT?
>>>
>>> Best,
>>> Paul Lam
>>>
>>> 2022年5月17日 20:34，godfrey he <godfre...@gmail.com> 写道：
>>>
>>> Godfrey
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to