Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

Leonard Xu Sun, 08 Jun 2025 19:01:45 -0700

Thanks  Timo for joining this thread.

I agree that this feature is needed by the community; the current disagreement 
is only about the implementation method or solution. 
Your thoughts looks generally good to me, looking forward to your proposal.


Best,
Leonard

> 2025 6月 6 22:46，Timo Walther <twal...@apache.org> 写道：
> 
> Hi everyone,
> 
> thanks for this healthy discussion. Looking at high number of participants, 
> it looks like we definitely want this feature. We just need to figure out the 
> "how".
> 
> This reminds me very much of the discussion we had for CREATE FUNCTION. 
> There, we discussed whether functions should be named globally or 
> catalog-specific. In the end, we decided for both `CREATE SYSTEM FUNCTION` 
> and `CREATE FUNCTION`, satisfying both the data platform team of an 
> organization (which might provide system functions) and individual data teams 
> or use cases (scoped by catalog/database).
> 
> Looking at other modern vendors like Snowflake there is SECRET (scoped to 
> schema) [1] and API INTEGRATION [2] (scoped to account). So also other 
> vendors offer global and per-team / per-use case connections details.
> 
> In general, I think fitting connections into the existing concepts for 
> catalog objects (with three-part identifier) makes managing them easier. But 
> I also see the need for global defaults.
> 
> Btw keep in mind that a catalog implementation should only store metadata. 
> Similar how a CatalogTable doesn't store the actual data, a CatalogConnection 
> should not store the credentials. It should only offer a factory that allows 
> for storing and retrieving them. In real world scenarios a factory is most 
> likely backed by a product like Azure Key Vault.
> 
> So code-wise having a ConnectionManager that behaves similar to 
> FunctionManager sounds reasonable.
> 
> +1 for having special syntax instead of using properties. This allows to 
> access connections in tables, models, functions. And catalogs, if we agree to 
> have global ones as well.
> 
> What do you think?
> 
> Let me spend some more thoughts on this and come back with a concrete 
> proposal by early next week.
> 
> Cheers,
> Timo
> 
> [1] https://docs.snowflake.com/en/sql-reference/sql/create-secret
> [2] https://docs.snowflake.com/en/sql-reference/sql/create-api-integration
> 
> On 04.06.25 10:47, Leonard Xu wrote:
>> Hey，Mayank
>> Please see my feedback as following:
>> 1. One of the motivations of this FLIP is to improve security. However, the 
>> current design stores all connection information in the catalog,
>> and each Flink SQL job reads from the catalog during compilation. The 
>> connection information is passed between SQL Gateway and the
>> catalog in plaintext, which actually introduces new security risks.
>> 2. The name "Connection" should be changed to something like ConnectionSpec 
>> to clearly indicate that it is a object containing only static
>> properties without a lifecycle. Putting aside the naming issue, I think the 
>> current model and hierarchy design is somewhat strange. Storing
>> various kinds of connections (e.g., Kafka, MySQL) in the same Catalog with 
>> hierarchical identifiers like catalog-name.db-name.connection-name
>> raises the following questions:
>>  (1) What is the purpose of this hierarchical structure of Connection object 
>> ?
>>  (2) If we can use a Connection to create a MySQL table, why can't we use a 
>> Connection to create a MySQL Catalog?
>> 3. Regarding the connector usage examples given in this FLIP:
>> ```sql
>> 1  -- Example 2: Using connection for jdbc tables
>> 2  CREATE OR REPLACE CONNECTION mysql_customer_db
>> 3  WITH (
>> 4    'type' = 'jdbc',
>> 5    'jdbc.url' = 'jdbc:mysql://customer-db.example.com:3306/customerdb',
>> 6    'jdbc.connection.ssl.enabled' = 'true'
>> 7  );
>> 8
>> 9  CREATE TABLE customers (
>> 10   customer_id INT,
>> 11   PRIMARY KEY (customer_id) NOT ENFORCED
>> 12 ) WITH (
>> 13   'connector' = 'jdbc',
>> 14   'jdbc.connection' = 'mysql_customer_db',
>> 15   'jdbc.connection.ssl.enabled' = 'true',
>> 16   'jdbc.connection.max-retry-timeout' = '60s',
>> 17   'jdbc.table-name' = 'customers',
>> 18   'jdbc.lookup.cache' = 'PARTIAL'
>> 19 );
>> ```
>> I see three issues from SQL semantics and Connector compatibility 
>> perspectives:
>> (1) Look at line 14: `mysql_customer_db` is an object identifier of a 
>> CONNECTION defined in SQL. However, this identifier is referenced
>>     via a string value inside the table’s WITH clause, which feel hack for 
>> me.
>> (2) Look at lines 14–16: the use of the specific prefix `jdbc.connection` 
>> will confuse users because `connection.xx` maybe already used as
>>  a prefix for existing configuration items.
>> (3) Look at lines 14–18: Why do all existing configuration options need to 
>> be prefixed with `jdbc`, even they’re not related to Connection properties?
>> This completely changes user habits — is it backward compatible?
>>  In my opinion, Connection should be a model independent of both Catalog and 
>> Table, and can be referenced by all catalog/table/udf/model object.
>> It should be managed by a Component such as a ConnectionManager to enable 
>> reuse. For security purposes, authentication mechanisms could
>> be supported within the ConnectionManager.
>> Best,
>> Leonard
>>> 2025 6月 4 02:04，Martijn Visser <martijnvis...@apache.org> 写道：
>>> 
>>> Hi all,
>>> 
>>> First of all, I think having a Connection resource is something that will
>>> be beneficial for Apache Flink. I could see that being extended in the
>>> future to allow for easier secret handling [1].
>>> In my mental mind, I'm comparing this proposal against SQL/MED from the ISO
>>> standard [2]. I do think that SQL/MED isn't a very user friendly syntax
>>> though, looking at Postgres for example [3].
>>> 
>>> I think it's a valid question if Connection should be considered with a
>>> catalog or database-level scope. @Ryan can you share something more, since
>>> you've mentioned "Note: I much prefer catalogs for this case. Which is what
>>> we use internally to manage connection properties". It looks like there
>>> isn't a strong favourable approach looking at other vendors (like,
>>> Databricks does scopes it on a Unity catalog, Snowflake on a database
>>> level).
>>> 
>>> Also looking forward to Leonard's input.
>>> 
>>> Best regards,
>>> 
>>> Martijn
>>> 
>>> [1] https://issues.apache.org/jira/browse/FLINK-36818
>>> [2] https://www.iso.org/standard/84804.html
>>> [3] https://www.postgresql.org/docs/current/sql-createserver.html
>>> 
>>> On Fri, May 30, 2025 at 5:07 AM Leonard Xu <xbjt...@gmail.com> wrote:
>>> 
>>>> Hey Mayank.
>>>> 
>>>> Thanks for the FLIP, I went through this FLIP quickly and found some
>>>> issues which I think we
>>>> need to deep discuss later. As we’re on a short Dragon boat Festival,
>>>> could you kindly hold
>>>> on this thread? and we will back to continue the FLIP discuss.
>>>> 
>>>> Best,
>>>> Leonard
>>>> 
>>>> 
>>>>> 2025 4月 29 23:07，Mayank Juneja <mayankjunej...@gmail.com> 写道：
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I would like to open up for discussion a new FLIP-529 [1].
>>>>> 
>>>>> Motivation:
>>>>> Currently, Flink SQL handles external connectivity by defining endpoints
>>>>> and credentials in table configuration. This approach prevents
>>>> reusability
>>>>> of these connections and makes table definition less secure by exposing
>>>>> sensitive information.
>>>>> We propose the introduction of a new "connection" resource in Flink. This
>>>>> will be a pluggable resource configured with a remote endpoint and
>>>>> associated access key. Once defined, connections can be reused across
>>>> table
>>>>> definitions, and eventually for model definition (as discussed in
>>>> FLIP-437)
>>>>> for inference, enabling seamless and secure integration with external
>>>>> systems.
>>>>> The connection resource will provide a new, optional way to manage
>>>> external
>>>>> connectivity in Flink. Existing methods for table definitions will remain
>>>>> unchanged.
>>>>> 
>>>>> [1] https://cwiki.apache.org/confluence/x/cYroF
>>>>> 
>>>>> Best Regards,
>>>>> Mayank Juneja
>>>> 
>>>> 
>

Re: [DISCUSS] FLIP-529 Connections in Flink SQL and Table API

Reply via email to