Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

yu zelin Wed, 30 Nov 2022 06:39:34 -0800

Hi, all

Thanks Jim’s questions below. Here I’d like to reply to them.


>   1. For the Client Parser, is it going to work with the extended syntax
>   from the Flink Table Store?
> 
>   2. Relatedly, what will happen if an older Client tries to handle syntax
>   that a newer service supports?  (Suppose I use a 1.17 client with a 1.18
>   Gateway/system which has a new keyword.  Is there anything we should be
>   designing for upfront?)
> 
>   3. How will client and server version mismatches be handled?  Will a
>   single gateway be able to support multiple endpoint versions?
>   4. How are commands which change a session handled?  Are those sent via
>   an ExecuteStatementRequest?
> 
>   5. The remote POC uses polling for getting back status and getting back
>   results.  Would it be possible to switch to web sockets or some other
>   mechanism to avoid polling?  If polling is used for both, the polling
>   frequency should be different between local and remote configurations.
> 
>   6. What does this sentence mean?  "The reason why we didn't get the sql
>   type in client side is because it's hard for the lightweight client-level
>   parser to recognize some sql type  sql, such as query with CTE.  "
> 
>   7. What is the serialization lifecycle for results?  It makes sense to
>   have some control over whether the gateway returns results as SQL or JSON.
>   I'd love to see a way to avoid needing to serialize and deserialize results
>   on the SQL Gateway if possible.  I'm still new enough to the project that
>   I'm not sure if that's readily possible.  Maybe the SQL Gateway's return
>   type can be sent as part of the request so that the JobManager can send
>   back results in an advantageous format?
> 
>   8. Does ErrorType need to be marked as @PublicEvolving?
> 
> I'm excited for the SQL client to support gateway mode!  Given the change
> in design, do you think it'll still be part of the Flink 1.17 release?

1.  ClientParser can work with new (and unknown) SQL syntax. It is because if 
the 
sql type is not recognized, the sql will be submitted to the gateway directly.

For more information: Actually, the proposed ClientParser only do two things: 
(1) Tell client commands (help, clear, etc) and sqls apart.
(2) parses several sql types (e.g. SHOW CREATE statement, we can print raw 
string 
for the SHOW CREATE result instead of table). Here the recognization of sql 
types 
mostly affects the print style, and unrecognized sql also can be submitted to 
cluster. 
So the Client with new ClientParser can work compatible with new syntax.

2. First, I'd like to explain that the gateway APIs and supported syntax is two 
things. 
For example, ‘configureSession' and 'completeStatement' are APIs. As mentioned 
in #1, the sql statements which syntax is unknown will be submitted to the 
gateway, 
and whether they can be executed normally depends on whether the execution 
environment supports the syntax. 

> Is there anything we should be designing for upfront?

The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql gateway 
APIs.

3. 
> How will client and server version mismatches be handled?

A lower version client can work compatible with a higher version gateway 
because the 
old interfaces won’t be deleted. When a higher version client connects to a 
lower version 
gateway, the client should notify the users if they try to use unsupported 
features. For 
example, the client start option ‘-i’  means using initialization file to 
initialize the session.
We plan to use the gateway’s ‘configureSession’ to implement it. But this API 
is not 
implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the user try 
to
use ‘-i’ option to start the client with the 1.16 gateway, the client should 
tell the user that
Can’t execute ‘-i’ option with gateway which version is lower than V2.

>  Will a single gateway be able to support multiple endpoint versions?

Currently, the gateway only starts a highest version endpoint and the higher 
version endpoint 
is compatible with the lower version endpoint’s protocol.

4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the session 
configuration. 
Notice: the client can’t change the session (I mean, close current session and 
open another
one). I’m not sure if you have need to change the session itself?

5. 
>  Would it be possible to switch to web sockets or some other mechanism to 
> avoid polling?

Your suggestion is very good, but this flip is for supporting the remote 
client. How about taking 
it as a future work? 

> If polling is used for both, the polling frequency should be different 
> between local and remote 
configurations.

Our idea is to introduce a new session option (like 
'sql-client.result.fetch-interval') to control 
the fetching requests sending frequency. What do you think?

For more information: we are inclined to keep the polling behavior in this 
version. For streaming 
query, fetching results synchronously may occupy resources of the gateway in a 
long period. 
For example, if the job doesn’t return results for a long time because the 
window has not been 
triggered, the synchronously fetching will keep occupying the connection. In 
asynchronous 
situation, the gateway can return a NOT_READY_RESULT quickly and release the 
resources 
for other clients to use. I think we can make some improvements for the whole 
flow path in the 
future.

6. Sorry for that there is mistakes in this sentence. Let me make it clear.

We proposed to add 'ContentType' to indicates the result is for what kind of 
sql. In this sentence, 
I want to explain why we add 'ContentType' since the ClientParser can recognize 
the sql type too. 
It is because the proposed ClientParser can't recognize complex syntax. For 
example, it can’t 
recognize query with CTE. So the result should carry content type information 
to help the client to 
know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates the 
result is for a 
query statement.

7. 
> What is the serialization lifecycle for results?

1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON format)
4) Client receiving               : Byte[ ] -> RowData (deserialize)

>  Maybe the SQL Gateway's return type can be sent as part of the request so 
> that the 
JobManager can send  back results in an advantageous format?

Yes. I think it's an improvement for the Client and Gateway. We have some 
ideas. For example,

1) We can move the Gateway into the JobManager and reduce the Ser/De costs from 
JM to Gateway.
2) Or the Gateway can collect the data from the sink function directly instead 
of JobManager.

But I think we can leave this as a future work and discuss in another thread.

8. Yes.

> Do you think it'll still be part of the Flink 1.17 release?
Yes. We will try our best to finish the work.

Feel free to talk to me if I’m wrong or you have any other questions.


> 2022年11月25日 11:48，yu zelin <yuzelin....@gmail.com> 写道：
> 
> Hi, all
> 
> I want to initiate a discussion on the FLIP-275: Support Remote SQL Client 
> Based on SQL Gateway[1]. 
> The motivation of this FLIP is that the current SQL Client allows only local 
> connection which can not satisfy 
> the common need of connecting to a remote cluster.
> 
> Since the FLIP-91[2] has introduced SQL Gateway, we proposed to implement the 
> Remote SQL Client 
> based on SQL Gateway. In our design, we proposed two main changes:
> 
> 1. New remote mode client which performs connection to the remote gateway 
> through REST API.
> 2. Migration of the current local mode client. We proposed to refactor the 
> local client based on SQL Gateway 
>    to unify the interface for two modes.
> 
> Looking forward to your suggestions.
> 
> Best,
> Yu Zelin
> 
> [1] https://cwiki.apache.org/confluence/x/T48ODg
> [2] https://cwiki.apache.org/confluence/x/rIyMC

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Reply via email to