Re: [PR] [FLINK-38959][pipeline-connector][postgres]Added support for schema change in the postgres pipeline connector [flink-cdc]

via GitHub Fri, 13 Mar 2026 00:42:36 -0700


loserwang1024 commented on PR #4233:
URL: https://github.com/apache/flink-cdc/pull/4233#issuecomment-4053361936

I have reviewd this PR(by @linjianchang )and
https://github.com/apache/flink-cdc/pull/4259 by @zml1206 . Thanks for your
hard work！

I have also done some research on the Debezium codebase and PostgreSQL
replication, and I have another design idea that prefers to leverage
PostgreSQL’s native capabilities more fully. So I’d like to discuss it with you.

------------------------------------------------------

### PostgreSQL Pgoutput's relation message in logical replication
As document of PostgreSQL 16 says: [Section 55.5.3 - Logical Replication
Protocol Message
Flow](https://www.postgresql.org/docs/16/protocol-logical-replication.html)：

> "Every DML message contains a relation OID, identifying the publisher's
relation that was acted on. Before the first DML message for a given relation
OID, a Relation message will be sent, describing the schema of that relation.
Subsequently, a new Relation message will be sent if the relation's definition
has changed since the last Relation message was sent for it."

When DDL changes are executed in PostgreSQL, no corresponding logs are
generated. However, if the pgoutput sender is about to send the first DML
message for a new schema, it will first send a Relation message.

### how Debezium use it?
Depending on The decoder plug-in, schema updates take two completely
different paths:
● pgoutput: Sends a correlation message before each DML event,You can use
the applySchemaChangesForTable to actively update schema in advance.
shouldSchemaBeSynchronized() returns false, so synchronizeTableSchema() is an
empty operation for DML events.
● decoderbufs: If no RELATION message is sent, shouldSchemaBeSynchronized()
returns true (the default value). The schema is synchronized by comparing the
message column with the in-memory schema in a reactive manner.
Thus, if it is pgoutput, we can send schema change events on demand without
comparing each message?

### What cdc need to do?
Therefore, my personal opinion:
1. Extend the PostgresSchema and pass in a dispatcher. When a correlation
message is received,Put the schema in the event queue as a special message
2. When Postgres RecordEmitter receives the schema event, it:
3. Update schemas in split,For persistence (in the current master branch,
since pg cdc does not have a schema change event, the information in schemas
will never be updated)
4. Compare table changes and send scheme ddl in yaml.
<img width="2442" height="1254" alt="image"
src="https://github.com/user-attachments/assets/89ed012e-867f-46d1-be43-2ce54d464220";
/>

In this way, we can avoid intrusive changes to PostgreSQL through triggers,
and there is also no need to compare schemas for every message. More
importantly, it updates the schemas stored in the split, so schema consistency
can still be guaranteed after state recovery or restart.

@linjianchang @zml1206 @leonardBang , WDYT?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [FLINK-38959][pipeline-connector][postgres]Added support for schema change in the postgres pipeline connector [flink-cdc]

Reply via email to