Hi Paul, Hi Jark, Re JOBTREE, agree that it is out of the scope of this FLIP
Re `RELEASE SAVEPOINT ALL', if the community prefers 'DROP' then 'DROP SAVEPOINT ALL' housekeeping. WDYT? Best regards, Jing On Wed, Jun 8, 2022 at 2:54 PM Jark Wu <imj...@gmail.com> wrote: > Hi Jing, > > Regarding JOBTREE (job lineage), I agree with Paul that this is out of the > scope > of this FLIP and can be discussed in another FLIP. > > Job lineage is a big topic that may involve many problems: > 1) how to collect and report job entities, attributes, and lineages? > 2) how to integrate with data catalogs, e.g. Apache Atlas, DataHub? > 3) how does Flink SQL CLI/Gateway know the lineage information and show > jobtree? > 4) ... > > Best, > Jark > > On Wed, 8 Jun 2022 at 20:44, Jark Wu <imj...@gmail.com> wrote: > >> Hi Paul, >> >> I'm fine with using JOBS. The only concern is that this may conflict with >> displaying more detailed >> information for query (e.g. query content, plan) in the future, e.g. SHOW >> QUERIES EXTENDED in ksqldb[1]. >> This is not a big problem as we can introduce SHOW QUERIES in the future >> if necessary. >> >> > STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >> `table.job.stop-with-drain`) >> What about STOP JOB <job_id> [WITH SAVEPOINT] [WITH DRAIN] ? >> It might be trivial and error-prone to set configuration before executing >> a statement, >> and the configuration will affect all statements after that. >> >> > CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >> We can simplify the statement to "CREATE SAVEPOINT FOR JOB <job_id>", >> and always use configuration "state.savepoints.dir" as the default >> savepoint dir. >> The concern with using "<savepoint_path>" is here should be savepoint >> dir, >> and savepoint_path is the returned value. >> >> I'm fine with other changes. >> >> Thanks, >> Jark >> >> [1]: >> https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/ >> >> >> >> >> On Wed, 8 Jun 2022 at 15:07, Paul Lam <paullin3...@gmail.com> wrote: >> >>> Hi Jing, >>> >>> Thank you for your inputs! >>> >>> TBH, I haven’t considered the ETL scenario that you mentioned. I think >>> they’re managed just like other jobs interns of job lifecycles (please >>> correct me if I’m wrong). >>> >>> WRT to the SQL statements about SQL lineages, I think it might be a >>> little bit out of the scope of the FLIP, since it’s mainly about >>> lifecycles. By the way, do we have these functionalities in Flink CLI or >>> REST API already? >>> >>> WRT `RELEASE SAVEPOINT ALL`, I’m sorry for the deprecated FLIP docs, the >>> community is more in favor of `DROP SAVEPOINT <savepoint_path>`. I’m >>> updating the FLIP arcading to the latest discussions. >>> >>> Best, >>> Paul Lam >>> >>> 2022年6月8日 07:31,Jing Ge <j...@ververica.com> 写道: >>> >>> Hi Paul, >>> >>> Sorry that I am a little bit too late to join this thread. Thanks for >>> driving this and starting this informative discussion. The FLIP looks >>> really interesting. It will help us a lot to manage Flink SQL jobs. >>> >>> Have you considered the ETL scenario with Flink SQL, where multiple SQLs >>> build a DAG for many DAGs? >>> >>> 1) >>> +1 for SHOW JOBS. I think sooner or later we will start to discuss how >>> to support ETL jobs. Briefly speaking, SQLs that used to build the DAG are >>> responsible to *produce* data as the result(cube, materialized view, etc.) >>> for the future consumption by queries. The INSERT INTO SELECT FROM example >>> in FLIP and CTAS are typical SQL in this case. I would prefer to call them >>> Jobs instead of Queries. >>> >>> 2) >>> Speaking of ETL DAG, we might want to see the lineage. Is it possible to >>> support syntax like: >>> >>> SHOW JOBTREE <job_id> // shows the downstream DAG from the given job_id >>> SHOW JOBTREE <job_id> FULL // shows the whole DAG that contains the >>> given job_id >>> SHOW JOBTREES // shows all DAGs >>> SHOW ANCIENTS <job_id> // shows all parents of the given job_id >>> >>> 3) >>> Could we also support Savepoint housekeeping syntax? We ran into this >>> issue that a lot of savepoints have been created by customers (via their >>> apps). It will take extra (hacking) effort to clean it. >>> >>> RELEASE SAVEPOINT ALL >>> >>> Best regards, >>> Jing >>> >>> On Tue, Jun 7, 2022 at 2:35 PM Martijn Visser <martijnvis...@apache.org> >>> wrote: >>> >>>> Hi Paul, >>>> >>>> I'm still doubting the keyword for the SQL applications. SHOW QUERIES >>>> could >>>> imply that this will actually show the query, but we're returning IDs of >>>> the running application. At first I was also not very much in favour of >>>> SHOW JOBS since I prefer calling it 'Flink applications' and not 'Flink >>>> jobs', but the glossary [1] made me reconsider. I would +1 SHOW/STOP >>>> JOBS >>>> >>>> Also +1 for the CREATE/SHOW/DROP SAVEPOINT syntax. >>>> >>>> Best regards, >>>> >>>> Martijn >>>> >>>> [1] >>>> >>>> https://nightlies.apache.org/flink/flink-docs-master/docs/concepts/glossary >>>> >>>> Op za 4 jun. 2022 om 10:38 schreef Paul Lam <paullin3...@gmail.com>: >>>> >>>> > Hi Godfrey, >>>> > >>>> > Sorry for the late reply, I was on vacation. >>>> > >>>> > It looks like we have a variety of preferences on the syntax, how >>>> about we >>>> > choose the most acceptable one? >>>> > >>>> > WRT keyword for SQL jobs, we use JOBS, thus the statements related to >>>> jobs >>>> > would be: >>>> > >>>> > - SHOW JOBS >>>> > - STOP JOBS <job_id> (with options `table.job.stop-with-savepoint` and >>>> > `table.job.stop-with-drain`) >>>> > >>>> > WRT savepoint for SQL jobs, we use the `CREATE/DROP` pattern with `FOR >>>> > JOB`: >>>> > >>>> > - CREATE SAVEPOINT <savepoint_path> FOR JOB <job_id> >>>> > - SHOW SAVEPOINTS FOR JOB <job_id> (show savepoints the current job >>>> > manager remembers) >>>> > - DROP SAVEPOINT <savepoint_path> >>>> > >>>> > cc @Jark @ShengKai @Martijn @Timo . >>>> > >>>> > Best, >>>> > Paul Lam >>>> > >>>> > >>>> > godfrey he <godfre...@gmail.com> 于2022年5月23日周一 21:34写道: >>>> > >>>> >> Hi Paul, >>>> >> >>>> >> Thanks for the update. >>>> >> >>>> >> >'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>>> >> (DataStream or SQL) or >>>> >> clients (SQL client or CLI). >>>> >> >>>> >> Is DataStream job a QUERY? I think not. >>>> >> For a QUERY, the most important concept is the statement. But the >>>> >> result does not contain this info. >>>> >> If we need to contain all jobs in the cluster, I think the name >>>> should >>>> >> be JOB or PIPELINE. >>>> >> I learn to SHOW PIPELINES and STOP PIPELINE [IF RUNNING] id. >>>> >> >>>> >> > SHOW SAVEPOINTS >>>> >> To list the savepoint for a specific job, we need to specify a >>>> >> specific pipeline, >>>> >> the syntax should be SHOW SAVEPOINTS FOR PIPELINE id >>>> >> >>>> >> Best, >>>> >> Godfrey >>>> >> >>>> >> Paul Lam <paullin3...@gmail.com> 于2022年5月20日周五 11:25写道: >>>> >> > >>>> >> > Hi Jark, >>>> >> > >>>> >> > WRT “DROP QUERY”, I agree that it’s not very intuitive, and that’s >>>> >> > part of the reason why I proposed “STOP/CANCEL QUERY” at the >>>> >> > beginning. The downside of it is that it’s not ANSI-SQL compatible. >>>> >> > >>>> >> > Another question is, what should be the syntax for ungracefully >>>> >> > canceling a query? As ShengKai pointed out in a offline discussion, >>>> >> > “STOP QUERY” and “CANCEL QUERY” might confuse SQL users. >>>> >> > Flink CLI has both stop and cancel, mostly due to historical >>>> problems. >>>> >> > >>>> >> > WRT “SHOW SAVEPOINT”, I agree it’s a missing part. My concern is >>>> >> > that savepoints are owned by users and beyond the lifecycle of a >>>> Flink >>>> >> > cluster. For example, a user might take a savepoint at a custom >>>> path >>>> >> > that’s different than the default savepoint path, I think >>>> jobmanager >>>> >> would >>>> >> > not remember that, not to mention the jobmanager may be a fresh new >>>> >> > one after a cluster restart. Thus if we support “SHOW SAVEPOINT”, >>>> it's >>>> >> > probably a best-effort one. >>>> >> > >>>> >> > WRT savepoint syntax, I’m thinking of the semantic of the >>>> savepoint. >>>> >> > Savepoints are alias for nested transactions in DB area[1], and >>>> there’s >>>> >> > correspondingly global transactions. If we consider Flink jobs as >>>> >> > global transactions and Flink checkpoints as nested transactions, >>>> >> > then the savepoint semantics are close, thus I think savepoint >>>> syntax >>>> >> > in SQL-standard could be considered. But again, I’m don’t have very >>>> >> > strong preference. >>>> >> > >>>> >> > Ping @Timo to get more inputs. >>>> >> > >>>> >> > [1] https://en.wikipedia.org/wiki/Nested_transaction < >>>> >> https://en.wikipedia.org/wiki/Nested_transaction> >>>> >> > >>>> >> > Best, >>>> >> > Paul Lam >>>> >> > >>>> >> > > 2022年5月18日 17:48,Jark Wu <imj...@gmail.com> 写道: >>>> >> > > >>>> >> > > Hi Paul, >>>> >> > > >>>> >> > > 1) SHOW QUERIES >>>> >> > > +1 to add finished time, but it would be better to call it >>>> "end_time" >>>> >> to >>>> >> > > keep aligned with names in Web UI. >>>> >> > > >>>> >> > > 2) DROP QUERY >>>> >> > > I think we shouldn't throw exceptions for batch jobs, otherwise, >>>> how >>>> >> to >>>> >> > > stop batch queries? >>>> >> > > At present, I don't think "DROP" is a suitable keyword for this >>>> >> statement. >>>> >> > > From the perspective of users, "DROP" sounds like the query >>>> should be >>>> >> > > removed from the >>>> >> > > list of "SHOW QUERIES". However, it doesn't. Maybe "STOP QUERY" >>>> is >>>> >> more >>>> >> > > suitable and >>>> >> > > compliant with commands of Flink CLI. >>>> >> > > >>>> >> > > 3) SHOW SAVEPOINTS >>>> >> > > I think this statement is needed, otherwise, savepoints are lost >>>> >> after the >>>> >> > > SAVEPOINT >>>> >> > > command is executed. Savepoints can be retrieved from REST API >>>> >> > > "/jobs/:jobid/checkpoints" >>>> >> > > with filtering "checkpoint_type"="savepoint". It's also worth >>>> >> considering >>>> >> > > providing "SHOW CHECKPOINTS" >>>> >> > > to list all checkpoints. >>>> >> > > >>>> >> > > 4) SAVEPOINT & RELEASE SAVEPOINT >>>> >> > > I'm a little concerned with the SAVEPOINT and RELEASE SAVEPOINT >>>> >> statements >>>> >> > > now. >>>> >> > > In the vendors, the parameters of SAVEPOINT and RELEASE >>>> SAVEPOINT are >>>> >> both >>>> >> > > the same savepoint id. >>>> >> > > However, in our syntax, the first one is query id, and the >>>> second one >>>> >> is >>>> >> > > savepoint path, which is confusing and >>>> >> > > not consistent. When I came across SHOW SAVEPOINT, I thought >>>> maybe >>>> >> they >>>> >> > > should be in the same syntax set. >>>> >> > > For example, CREATE SAVEPOINT FOR [QUERY] <query_id> & DROP >>>> SAVEPOINT >>>> >> > > <sp_path>. >>>> >> > > That means we don't follow the majority of vendors in SAVEPOINT >>>> >> commands. I >>>> >> > > would say the purpose is different in Flink. >>>> >> > > What other's opinion on this? >>>> >> > > >>>> >> > > Best, >>>> >> > > Jark >>>> >> > > >>>> >> > > [1]: >>>> >> > > >>>> >> >>>> https://nightlies.apache.org/flink/flink-docs-master/docs/ops/rest_api/#jobs-jobid-checkpoints >>>> >> > > >>>> >> > > >>>> >> > > On Wed, 18 May 2022 at 14:43, Paul Lam <paullin3...@gmail.com> >>>> wrote: >>>> >> > > >>>> >> > >> Hi Godfrey, >>>> >> > >> >>>> >> > >> Thanks a lot for your inputs! >>>> >> > >> >>>> >> > >> 'SHOW QUERIES' lists all jobs in the cluster, no limit on APIs >>>> >> (DataStream >>>> >> > >> or SQL) or >>>> >> > >> clients (SQL client or CLI). Under the hook, it’s based on >>>> >> > >> ClusterClient#listJobs, the >>>> >> > >> same with Flink CLI. I think it’s okay to have non-SQL jobs >>>> listed >>>> >> in SQL >>>> >> > >> client, because >>>> >> > >> these jobs can be managed via SQL client too. >>>> >> > >> >>>> >> > >> WRT finished time, I think you’re right. Adding it to the FLIP. >>>> But >>>> >> I’m a >>>> >> > >> bit afraid that the >>>> >> > >> rows would be too long. >>>> >> > >> >>>> >> > >> WRT ‘DROP QUERY’, >>>> >> > >>> What's the behavior for batch jobs and the non-running jobs? >>>> >> > >> >>>> >> > >> >>>> >> > >> In general, the behavior would be aligned with Flink CLI. >>>> Triggering >>>> >> a >>>> >> > >> savepoint for >>>> >> > >> a non-running job would cause errors, and the error message >>>> would be >>>> >> > >> printed to >>>> >> > >> the SQL client. Triggering a savepoint for batch(unbounded) >>>> jobs in >>>> >> > >> streaming >>>> >> > >> execution mode would be the same with streaming jobs. However, >>>> for >>>> >> batch >>>> >> > >> jobs in >>>> >> > >> batch execution mode, I think there would be an error, because >>>> batch >>>> >> > >> execution >>>> >> > >> doesn’t support checkpoints currently (please correct me if I’m >>>> >> wrong). >>>> >> > >> >>>> >> > >> WRT ’SHOW SAVEPOINTS’, I’ve thought about it, but Flink >>>> >> clusterClient/ >>>> >> > >> jobClient doesn’t have such a functionality at the moment, >>>> neither do >>>> >> > >> Flink CLI. >>>> >> > >> Maybe we could make it a follow-up FLIP, which includes the >>>> >> modifications >>>> >> > >> to >>>> >> > >> clusterClient/jobClient and Flink CLI. WDYT? >>>> >> > >> >>>> >> > >> Best, >>>> >> > >> Paul Lam >>>> >> > >> >>>> >> > >>> 2022年5月17日 20:34,godfrey he <godfre...@gmail.com> 写道: >>>> >> > >>> >>>> >> > >>> Godfrey >>>> >> > >> >>>> >> > >> >>>> >> > >>>> >> >>>> > >>>> >>> >>>