Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Shengkai Fang Fri, 29 Apr 2022 00:36:55 -0700

Hi.

Thanks for Paul's update.


> It's better we can also get the infos about the cluster where the job is
> running through the DESCRIBE statement.

I just wonder how the users can get the web ui in the application mode.
Therefore, it's better we can list the Web UI using the SHOW statement.
WDYT?


> QUERY or other keywords.

I list the statement to manage the lifecycle of the query/dml in other
systems:

Mysql[1] allows users to SHOW [FULL] PROCESSLIST and use the KILL command
to kill the query.

```
mysql> SHOW PROCESSLIST;

mysql> KILL 27;
```


Postgres use the following statements to kill the queries.

```
SELECT pg_cancel_backend(<pid of the process>)

SELECT pg_terminate_backend(<pid of the process>)
```

KSQL uses the following commands to control the query lifecycle[4].

```
SHOW QUERIES;

TERMINATE <query id>;

```

[1] https://dev.mysql.com/doc/refman/8.0/en/show-processlist.html
[2] https://scaledynamix.com/blog/how-to-kill-mysql-queries/
[3]
https://stackoverflow.com/questions/35319597/how-to-stop-kill-a-query-in-postgresql
[4]
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/show-queries/
[5]
https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/terminate/

After the investigation, I am fine with the QUERY but the keyword JOB is
also okay to me.

We also have two questions here.

1. Could you add some details about the behaviour with the different
execution.target, e.g. session, application mode?

2. Considering the SQL Client/Gateway is not limited to submitting the job
to the specified cluster, is it able to list jobs in the other clusters?


Best,
Shengkai

Paul Lam <paullin3...@gmail.com> 于2022年4月28日周四 17:17写道：

> Hi Martjin,
>
> Thanks a lot for your reply! I agree that the scope may be a bit confusing,
> please let me clarify.
>
> The FLIP aims to add new SQL statements that are supported only in
> sql-client, similar to
> jar statements [1]. Jar statements can be parsed into jar operations, which
> are used only in
> CliClient in sql-client module and cannot be executed by TableEnvironment
> (not available in
> Table API program that contains SQL that you mentioned).
>
> WRT the unchanged CLI client, I mean CliClient instead of the sql-client
> module, which
> currently contains the gateway codes (e.g. Executor). The FLIP mainly
> extends
> the gateway part, and barely touches CliClient and REST server (REST
> endpoint in FLIP-91).
>
> WRT the syntax, I don't have much experience with SQL standards, and I'd
> like to hear
> more opinions from the community. I prefer Hive-style syntax because I
> think many users
> are familiar with Hive, and there're on-going efforts to improve Flink-Hive
> integration [2][3].
> But my preference is not strong, I'm okay with other options too. Do you
> think JOB/Task is
> a good choice, or do you have other preferred keywords?
>
> [1]
>
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/table/sql/jar/
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-152%3A+Hive+Query+Syntax+Compatibility
> [3]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-223%3A+Support+HiveServer2+Endpoint
>
> Best,
> Paul Lam
>
> Martijn Visser <martijnvis...@apache.org> 于2022年4月26日周二 20:14写道：
>
> > Hi Paul,
> >
> > Thanks for creating the FLIP and opening the discussion. I did get a bit
> > confused about the title, being "query lifecycle statements in SQL
> client".
> > This sounds like you want to adopt the SQL client, but you want to expand
> > the SQL syntax with lifecycle statements, which could be used from the
> SQL
> > client, but of course also in a Table API program that contains SQL.
> GIven
> > that you're highlighting the CLI client as unchanged, this adds to more
> > confusion.
> >
> > I am interested if there's anything listed in the SQL 2016 standard on
> > these types of lifecycle statements. I did a quick scan for "SHOW
> QUERIES"
> > but couldn't find it. It would be great if we could stay as close as
> > possible to such syntax. Overall I'm not in favour of using QUERIES as a
> > keyword. I think Flink applications are not queries, but short- or long
> > running applications. Why should we follow Hive's setup and indeed not
> > others such as Snowflake, but also Postgres or MySQL?
> >
> > Best regards,
> >
> > Martijn Visser
> > https://twitter.com/MartijnVisser82
> > https://github.com/MartijnVisser
> >
> >
> > On Fri, 22 Apr 2022 at 12:06, Paul Lam <paullin3...@gmail.com> wrote:
> >
> > > Hi Shengkai,
> > >
> > > Thanks a lot for your opinions!
> > >
> > > > 1. I think the keyword QUERY may confuse users because the statement
> > also
> > > > works for the DML statement.
> > >
> > > I slightly lean to QUERY, because:
> > >
> > > Hive calls DMLs queries. We could be better aligned with Hive using
> > QUERY,
> > > especially given that we plan to introduce Hive endpoint.
> > > QUERY is a more SQL-like concept and friendly to SQL users.
> > >
> > > In general, my preference: QUERY > JOB > TASK. I’m okay with JOB, but
> not
> > > very good with TASK, as it conflicts with the task concept in Flink
> > runtime.
> > >
> > > We could wait for more feedbacks from the community.
> > >
> > > > 2. STOP/CANCEL is not very straightforward for the SQL users to
> > terminate
> > > > their jobs.
> > >
> > > Agreed. I’m okay with DROP. And if we want to align with Hive, KILL
> might
> > > an alternative.
> > >
> > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like.
> > >
> > > Agreed. It’s more SQL-like and intuitive. I’m updating the syntax on
> the
> > > FLIP.
> > >
> > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get
> more
> > > > detailed job infos.
> > >
> > > That is a more SQL-like approach I think. But considering the
> > > ClusterClient APIs, we can fetch the names and the status along in one
> > > request,
> > > thus it may be more user friendly to return them all in the SHOW
> > > statement?
> > >
> > > > It's better we can also get the infos about the cluster where the job
> > is
> > > > running on through the DESCRIBE statement.
> > >
> > > I think cluster info could be part of session properties instead. WDYT?
> > >
> > > Best,
> > > Paul Lam
> > >
> > > > 2022年4月22日 11:14，Shengkai Fang <fskm...@gmail.com> 写道：
> > > >
> > > > Hi Paul
> > > >
> > > > Sorry for the late response. I propose my thoughts here.
> > > >
> > > > 1. I think the keyword QUERY may confuse users because the statement
> > also
> > > > works for the DML statement. I find the Snowflakes[1] supports
> > > >
> > > > - CREATE TASK
> > > > - DROP TASK
> > > > - ALTER TASK
> > > > - SHOW TASKS
> > > > - DESCRIPE TASK
> > > >
> > > > I think we can follow snowflake to use `TASK` as the keyword or use
> the
> > > > keyword `JOB`?
> > > >
> > > > 2. STOP/CANCEL is not very straightforward for the SQL users to
> > terminate
> > > > their jobs.
> > > >
> > > > ```
> > > > DROP TASK [IF EXISTS] <job id> PURGE; -- Forcely stop the job with
> > drain
> > > >
> > > > DROP TASK [IF EXISTS] <job id>; -- Stop the task with savepoints
> > > > ```
> > > >
> > > > Oracle[2] uses the PURGE to clean up the table and users can't not
> > > recover.
> > > > I think it also works for us to terminate the job permanently.
> > > >
> > > > 3. I think CREATE/DROP SAVEPOINTS statement is more SQL-like. Users
> can
> > > use
> > > > the
> > > >
> > > > ```
> > > >  SET 'state.savepoints.dir' = '<path_to_savepoint>';
> > > >  SET 'state.savepoints.fomat' = 'native';
> > > >  CREATE SAVEPOINT <job id>;
> > > >
> > > >  DROP SAVEPOINT <path_to_savepoint>;
> > > > ```
> > > >
> > > > 4. SHOW TASKS can just list the job id and use the DESCRIPE to get
> more
> > > > detailed job infos.
> > > >
> > > > ```
> > > >
> > > > SHOW TASKS;
> > > >
> > > >
> > > > +----------------------------------+
> > > > |            job_id                |
> > > > +----------------------------------+
> > > > | 0f6413c33757fbe0277897dd94485f04 |
> > > > +----------------------------------+
> > > >
> > > > DESCRIPE TASK <job id>;
> > > >
> > > > +------------------------
> > > > |  job name   | status  |
> > > > +------------------------
> > > > | insert-sink | running |
> > > > +------------------------
> > > >
> > > > ```
> > > > It's better we can also get the infos about the cluster where the job
> > is
> > > > running on through the DESCRIBE statement.
> > > >
> > > >
> > > > [1]
> > > >
> > >
> >
> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> > > <
> > >
> >
> https://docs.snowflake.com/en/sql-reference/ddl-pipeline.html#task-management
> > > >
> > > > [2]
> > > >
> > >
> >
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> > > <
> > >
> >
> https://docs.oracle.com/cd/E11882_01/server.112/e41084/statements_9003.htm#SQLRF01806
> > > >
> > > >
> > > > Paul Lam <paullin3...@gmail.com <mailto:paullin3...@gmail.com>>
> > > 于2022年4月21日周四 10:36写道：
> > > >
> > > >> ping @Timo @Jark @Shengkai
> > > >>
> > > >> Best,
> > > >> Paul Lam
> > > >>
> > > >>> 2022年4月18日 17:12，Paul Lam <paullin3...@gmail.com> 写道：
> > > >>>
> > > >>> Hi team,
> > > >>>
> > > >>> I’d like to start a discussion about FLIP-222 [1], which adds query
> > > >> lifecycle
> > > >>> statements to SQL client.
> > > >>>
> > > >>> Currently, SQL client supports submitting queries (queries in a
> broad
> > > >> sense,
> > > >>> including DQLs and DMLs) but no further lifecycle statements, like
> > > >> canceling
> > > >>> a query or triggering a savepoint. That makes SQL users have to
> rely
> > on
> > > >>> CLI or REST API to manage theirs queries.
> > > >>>
> > > >>> Thus, I propose to introduce the following statements to fill the
> > gap.
> > > >>> SHOW QUERIES
> > > >>> STOP QUERY <query_id>
> > > >>> CANCEL QUERY <query_id>
> > > >>> TRIGGER SAVEPOINT <savepoint_path>
> > > >>> DISPOSE SAVEPOINT <savepoint_path>
> > > >>> These statement would align SQL client with CLI, providing the full
> > > >> lifecycle
> > > >>> management for queries/jobs.
> > > >>>
> > > >>> Please see the FLIP page[1] for more details. Thanks a lot!
> > > >>> (For reference, the previous discussion thread see [2].)
> > > >>>
> > > >>> [1]
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222%3A+Support+full+query+lifecycle+statements+in+SQL+client
> > > >> <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> > > <
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-222:+Support+full+query+lifecycle+statements+in+SQL+client
> > > >
> > > >>>
> > > >>> [2]
> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb
> > <
> > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb> <
> > > >> https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb <
> > > https://lists.apache.org/thread/wr47ng0m2hdybjkrwjlk9ftwg403odqb>>
> > > >>>
> > > >>> Best,
> > > >>> Paul Lam
> > >
> > >
> >
>

Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

Reply via email to