Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Timo Walther Mon, 12 Dec 2022 02:09:29 -0800

Hi everyone,

sorry to jump into this discussion so late.

> So we decided to revert the RowFormat related changes and let theclient to resolve the print format.

Could you elaborate a bit on this topic in the FLIP? I still believethat we need 2 types of output formats.

Format A: for the SQL Client CLI and other interactive notebooks thatjust uses SQL CAST(... AS STRING) semantics executed on the server side


Format B: for JDBC SDK or other machine-readable downstream libraries

Take a TIMESTAMP WITH LOCAL TIME ZONE as an example. The stringrepresentation depends on a session configuration option. Clients mightnot be aware of this session option, so the formatting must happen onthe server side.

However, when the downstream consumer is a library, maybe the librarywould like to get the raw millis/nanos since epoch.

Also nested rows and collections might be better encoded with format Bfor libraries but interactive sessions are happy if nested types arealready formatted server-side, so not every client needs custom code forthe formatting.


Regards,
Timo



On 06.12.22 15:13, godfrey he wrote:

Hi, zeklin

The CLI will use default print style for the non-query result.

Please make sure the print results of EXPLAIN/DESC/SHOW CREATE TABLE
commands are clear.

We think it’s better to add the root cause to the ErrorResponseBody.

LGTM

Best,
Godfrey

yu zelin <yuzelin....@gmail.com> 于2022年12月6日周二 17:51写道：


Hi, Godfrey

Thanks for your feedback. Below is my thoughts about your questions.

1. About RowFormat.
I agree to your opinion. So we decided to revert the RowFormat related changes
and let the client to resolve the print format.

2. About ContentType
I agree that the definition of the ContentType is not clear. But how to define 
the
statement type is another big question. So, we decided to only tell the query 
result
and non-query result apart. The CLI will use default print style for the 
non-query
result.

3. About ErrorHandling
I think reuse the current ErrorResponseBody is good, but parse the root cause
from the exception stack strings is quite hacking. We think it’s better to add 
the
root cause to the ErrorResponseBody.

4. About Runtime REST API Modifications
I agree, too. This part is moved to the ‘Future Work’.

Best,
Yu Zelin

2022年12月5日 18:33，godfrey he <godfre...@gmail.com> 写道：

Hi Zelin,

Thanks for driving this discussion.

I have a few comments,

Add RowFormat to ResultSet to indicate the format of rows.

We should not require SqlGateway server to meet the display
requirements of a CliClient.
Because different CliClients may have different display style. The
server just need to response the data,
and the CliClient prints the result as needed. So RowFormat is not needed.

Add ContentType to ResultSet to indicate what kind of data the result contains.

from my first sight, the values of ContentType are intersected, such
as: A select query will return QUERY_RESULT,
but it also has JOB_ID. OTHER is too ambiguous, I don't know which
kind of query will return OTHER.
I recommend returning the concrete type for each statement, such as
"CREATE TABLE" for "create table xx (...) with ()",
"SELECT" for "select * from xxx". The statement type can be maintained
in `Operation`s.

Error Handling

I think current design of error handling mechanism can meet the
requirement of CliClient, we can get the root cause from
the stack (see ErrorResponseBody#errors). If it becomes a common
requirement (for many clients) in the future,
we can introduce this interface.

Runtime REST API Modification for Local Client Migration

I think this part is over-engineered, this part belongs to optimization.
The client does not require very high performance, the current design
can already meet our needs.
If we find performance problems in the future, do such optimizations.

Best,
Godfrey

yu zelin <yuzelin....@gmail.com> 于2022年12月5日周一 11:11写道：


Hi, Shammon

Thanks for your feedback. I think it’s good to support jdbc-sdk. However,
it's not supported in the gateway side yet. In my opinion, this FLIP is more
concerned with the SQL Client. How about put “supporting jdbc-sdk” in
‘Future Work’? We can discuss how to implement it in another thread.

Best,
Yu Zelin

2022年12月2日 18:12，Shammon FY <zjur...@gmail.com> 写道：

Hi zelin

Thanks for driving this discussion.

I notice that the sql-client will interact with sql-gateway by `REST
Client` in the `Executor` in the FLIP, how about introducing jdbc-sdk for
sql-gateway?

Then the sql-client can connect the gateway with jdbc-sdk, on the other
hand, the other applications and tools such as jmeter can use the jdbc-sdk
to connect sql-gateway too.

Best,
Shammon


On Fri, Dec 2, 2022 at 4:10 PM yu zelin <yuzelin....@gmail.com> wrote:

Hi Jim,

Thanks for your feedback!

Should this configuration be mentioned in the FLIP?


Sure.

some way for the server to be able to limit the number of requests it

receives.
I’m sorry that this FLIP is dedicated in implementing the Remote mode, so
we
didn't consider much about this. I think the option is enough currently.
I will add
the improvement suggestions to the ‘Future Work’.

I wonder if two other options are possible


To forward the raw format to gateway and then to client is possible. The
raw
results from sink is in ‘CollectResultIterator#bufferedResult’. First, we
can find
a way to get this result without wrapping it. Second, constructing a
‘InternalTypeInfo’.
We can construct it using the schema information (data’s logical type).
After
construction, we can get the ’TypeSerializer’ to deserialize the raw
result.

2022年12月1日 04:54，Jim Hughes <jhug...@confluent.io.INVALID> 写道：

Hi Yu,

Thanks for moving my comments to this thread!  Also, thank you for
answering my questions; it is helping me understand the SQL Gateway
better.

5.

Our idea is to introduce a new session option (like

'sql-client.result.fetch-interval') to control
the fetching requests sending frequency. What do you think?

Should this configuration be mentioned in the FLIP?

One slight concern I have with having 'sql-client.result.fetch-interval'

as

a session configuration is that users could set it low and cause the

client

to send a large volume of requests to the SQL gateway.

Generally, I'd like to see some way for the server to be able to limit

the

number of requests it receives.  If that really needs to be done by a

proxy

in front of the SQL gateway, that is fine as well.  (To be clear, I don't
think my concern here should be blocking in any way.)

7.

What is the serialization lifecycle for results?


I wonder if two other options are possible:
3) Could the Gateway just forward the result byte array?  (Or does the
Gateway need to deserialize the response in order to understand it for

some

reason?)
4) Could the JobManager prepare the results in JSON?  (Or similarly could
the Client read the format which the JobManager sends?)

Thanks again!

Cheers,

Jim

On Wed, Nov 30, 2022 at 9:40 AM yu zelin <yuzelin....@gmail.com> wrote:

Hi, all

Thanks Jim’s questions below. Here I’d like to reply to them.

1. For the Client Parser, is it going to work with the extended syntax
from the Flink Table Store?

2. Relatedly, what will happen if an older Client tries to handle

syntax

that a newer service supports?  (Suppose I use a 1.17 client with a

1.18

Gateway/system which has a new keyword.  Is there anything we should

be

designing for upfront?)

3. How will client and server version mismatches be handled?  Will a
single gateway be able to support multiple endpoint versions?
4. How are commands which change a session handled?  Are those sent

via

an ExecuteStatementRequest?

5. The remote POC uses polling for getting back status and getting

back

results.  Would it be possible to switch to web sockets or some other
mechanism to avoid polling?  If polling is used for both, the polling
frequency should be different between local and remote configurations.

6. What does this sentence mean?  "The reason why we didn't get the

sql

type in client side is because it's hard for the lightweight

client-level

parser to recognize some sql type  sql, such as query with CTE.  "

7. What is the serialization lifecycle for results?  It makes sense to
have some control over whether the gateway returns results as SQL or

JSON.

I'd love to see a way to avoid needing to serialize and deserialize

results

on the SQL Gateway if possible.  I'm still new enough to the project

that

I'm not sure if that's readily possible.  Maybe the SQL Gateway's

return

type can be sent as part of the request so that the JobManager can

send

back results in an advantageous format?

8. Does ErrorType need to be marked as @PublicEvolving?

I'm excited for the SQL client to support gateway mode!  Given the

change

in design, do you think it'll still be part of the Flink 1.17 release?


1.  ClientParser can work with new (and unknown) SQL syntax. It is

because

if the
sql type is not recognized, the sql will be submitted to the gateway
directly.

For more information: Actually, the proposed ClientParser only do two
things:
(1) Tell client commands (help, clear, etc) and sqls apart.
(2) parses several sql types (e.g. SHOW CREATE statement, we can print

raw

string
for the SHOW CREATE result instead of table). Here the recognization of
sql types
mostly affects the print style, and unrecognized sql also can be

submitted

to cluster.
So the Client with new ClientParser can work compatible with new syntax.

2. First, I'd like to explain that the gateway APIs and supported syntax
is two things.
For example, ‘configureSession' and 'completeStatement' are APIs. As
mentioned
in #1, the sql statements which syntax is unknown will be submitted to

the

gateway,
and whether they can be executed normally depends on whether the

execution

environment supports the syntax.

Is there anything we should be designing for upfront?


The 'SqlGatewayRestAPIVersion’ has been introduced. But it is for sql
gateway APIs.

3.

How will client and server version mismatches be handled?


A lower version client can work compatible with a higher version gateway
because the
old interfaces won’t be deleted. When a higher version client connects

to

a lower version
gateway, the client should notify the users if they try to use

unsupported

features. For
example, the client start option ‘-i’  means using initialization file

to

initialize the session.
We plan to use the gateway’s ‘configureSession’ to implement it. But

this

API is not
implemented in 1.16 Gateway (SqlGatewayRestAPIVersion = V1), so if the
user try to
use ‘-i’ option to start the client with the 1.16 gateway, the client
should tell the user that
Can’t execute ‘-i’ option with gateway which version is lower than V2.

Will a single gateway be able to support multiple endpoint versions?


Currently, the gateway only starts a highest version endpoint and the
higher version endpoint
is compatible with the lower version endpoint’s protocol.

4. Yes. Mostly, we use ’SET’ and ‘RESET’ statements to change the

session

configuration.
Notice: the client can’t change the session (I mean, close current

session

and open another
one). I’m not sure if you have need to change the session itself?

5.

Would it be possible to switch to web sockets or some other mechanism

to avoid polling?

Your suggestion is very good, but this flip is for supporting the remote
client. How about taking
it as a future work?

If polling is used for both, the polling frequency should be different

between local and remote
configurations.

Our idea is to introduce a new session option (like
'sql-client.result.fetch-interval') to control
the fetching requests sending frequency. What do you think?

For more information: we are inclined to keep the polling behavior in

this

version. For streaming
query, fetching results synchronously may occupy resources of the

gateway

in a long period.
For example, if the job doesn’t return results for a long time because

the

window has not been
triggered, the synchronously fetching will keep occupying the

connection.

In asynchronous
situation, the gateway can return a NOT_READY_RESULT quickly and release
the resources
for other clients to use. I think we can make some improvements for the
whole flow path in the
future.

6. Sorry for that there is mistakes in this sentence. Let me make it

clear.


We proposed to add 'ContentType' to indicates the result is for what

kind

of sql. In this sentence,
I want to explain why we add 'ContentType' since the ClientParser can
recognize the sql type too.
It is because the proposed ClientParser can't recognize complex syntax.
For example, it can’t
recognize query with CTE. So the result should carry content type
information to help the client to
know the sql type. For example, the 'ContentType.QUERY_RESULT' indicates
the result is for a
query statement.

7.

What is the serialization lifecycle for results?


1) Sink to JobManager        : RowData -> Byte[ ] (serialize)
2) JobManager to Gateway : Byte[ ] -> RowData (deserialize)
3) Gateway sending            : RowData -> Byte[ ] (serialized to JSON
format)
4) Client receiving               : Byte[ ] -> RowData (deserialize)

Maybe the SQL Gateway's return type can be sent as part of the request

so that the
JobManager can send  back results in an advantageous format?

Yes. I think it's an improvement for the Client and Gateway. We have

some

ideas. For example,

1) We can move the Gateway into the JobManager and reduce the Ser/De

costs

from JM to Gateway.
2) Or the Gateway can collect the data from the sink function directly
instead of JobManager.

But I think we can leave this as a future work and discuss in another
thread.

8. Yes.

Do you think it'll still be part of the Flink 1.17 release?

Yes. We will try our best to finish the work.

Feel free to talk to me if I’m wrong or you have any other questions.

2022年11月25日 11:48，yu zelin <yuzelin....@gmail.com> 写道：

Hi, all

I want to initiate a discussion on the FLIP-275: Support Remote SQL

Client Based on SQL Gateway[1].

The motivation of this FLIP is that the current SQL Client allows only

local connection which can not satisfy

the common need of connecting to a remote cluster.

Since the FLIP-91[2] has introduced SQL Gateway, we proposed to

implement the Remote SQL Client

based on SQL Gateway. In our design, we proposed two main changes:

1. New remote mode client which performs connection to the remote

gateway through REST API.

2. Migration of the current local mode client. We proposed to refactor

the local client based on SQL Gateway

to unify the interface for two modes.

Looking forward to your suggestions.

Best,
Yu Zelin

[1] https://cwiki.apache.org/confluence/x/T48ODg
[2] https://cwiki.apache.org/confluence/x/rIyMC

Re: [DISCUSS] FLIP-275: Support Remote SQL Client Based on SQL Gateway

Reply via email to