Re: [Proposal] Flink SQL Improvement

陶克路 Fri, 29 Apr 2022 02:46:31 -0700

Hi, @leo65535, thanks for the reply.

Very glad to hear the topic about decoupling connectors from The Flink
version. That's a good direction.

About sql connectors management, actually I do not have a complete design
in my head.

But some points would be discussed firstly here:

1. Plugin style.
For SQL, we don't need to write the glue code to "register" connectors.
The only thing we need to do is to put the connector jars into ClassPath,
then the process (jm/tm) could find the connector class, which implements
specified interfaces, based on a mechanism like SPI. So maybe we can't
manage the connectors as plugins. When we submit the job, we add the
corresponding connector jars as resources to job.

2. Decouple connectors with Flink version.
Decoupling is necessary. To my knowledge, Flink SQL uses a factory
discovery mechanism to manage connectors. So when this interface is not
refactored in the Flink core, we can expect it as backward compatible.
Moreover, if it is version related, we can manage them in the arch like
`/plugins/<flink-version>/connectors/kafka`

3. Connectors dynamic loading.
To reduce the class conflict between connectors, I propose to do connectors
loading when necessary, namely dynamic loading. About how to find the
connectors used in SQL, I thinks we specified the connectors in config
file, or parse SQL to get the connectors info.

Maybe we can discuss this topic lately in a more detailed tech design doc.

Thanks,
Kelu.

On Fri, Apr 29, 2022 at 4:10 PM leo65535 <[email protected]> wrote:

>
>
> Hi @taokelu,
>
>
> This proposal is nice!!
> We also disscuss another topic "Decoupling connectors from compute
> engines"[1], I have a question how to manager flink sql connector?
>
>
> [1] https://lists.apache.org/thread/j99crn7nkfpwovng6ycbxhw65sxg9xn2
>
>
> Thanks, leo65535
>
>
>
> At 2022-04-29 11:06:42, "陶克路" <[email protected]> wrote:
>
> The background https://github.com/apache/incubator-seatunnel/issues/1753
>
>
> Let me have a brief introduction about the background. I found the Flink
> SQL support in Seatunnel is very simple, so I want to do some improvements
> on this story.
>
>
> And now seatunnel uses many deprecated datastream apis, which are
> encouraged to be replaced with SQL, such as
> `StreamTableEnvironment.connect`. Maybe SQL would be an alternative.
>
>
>
>
>
>
>
>
> Here are the improvement details:
> 1. refactor start-seatunnel-sql.sh. Now start-seatunnel-flink.sh and
> start-seatunnel-spark.sh have been refactored, and the main logic has been
> rewritten by java code. I think we can first keep them consistent.
> 2. enrich sql config file. Now flink sql job config is very simple, and
> it's all about the sql script. I think we can add more sub-config into it.
> 3. sql connectors management. Flink community supports a rich set of SQL
> connectors. Only with connectors, we can run our job successfully end-to-end
> 4. sql related logic. Such as validation before job running, throwing the
> error as soon as possible
> 5. Catalog support. With catalog, we can reuse tables/udfs defined in
> catalog.
> 6. kubernetes native mode support. Actually, this is a universal feature,
> not just about sql. In Flink, to run job in kubernetes native mode, we must
> bundle the main jar and dependency files into the Flink image. This is not
> user-friendly. Community support a workaround for this, namely podTemplate
> 7. ...
>
>
> This is a long-term plan. We can implement it step by step.
>
>
> What do you think about this PROPOSAL? Feel free to give any comment or
> suggestion.
>
>
> Thanks.
> Kelu.
> --
>
>
>
> Hello, Find me here: www.legendtkl.com.

-- 

Hello, Find me here: www.legendtkl.com.

Re: [Proposal] Flink SQL Improvement

Reply via email to