ouyangwulin created FLINK-38277:
-----------------------------------

             Summary: Enhance postgresSQL slot management capabilities
                 Key: FLINK-38277
                 URL: https://issues.apache.org/jira/browse/FLINK-38277
             Project: Flink
          Issue Type: Improvement
          Components: Flink CDC
    Affects Versions: cdc-3.5.0
            Reporter: ouyangwulin


Background
Slot (Replication Slot) is a very important mechanism in PostgreSQL, which is 
closely related to Write-Ahead Logging (WAL). It is mainly used in Streaming 
Replication and Logical Replication to ensure that the primary library does not 
prematurely delete the WAL logs still needed by the standby library. Postgres 
connector makes use of Logical Replication for incremental data 
synchronization. At the same time, Flink cdc supports batch mode and streaming 
mode. If the slot is not deleted in batch mode, the postgres main library log 
data will increase, which will occupy a large amount of disk.
2. Enhance the solution
2.1.Batch mode
pipeline is a batch controlled by execution.runtime-mode=batch, which requires 
scan.startup.mode=snapshot to run. Don't create slots when the job starts, and 
make sure to delete them when the job finishes executing, otherwise you'll have 
slots left over.
SQL mode is controlled by scan.startup.mode=snapshot to only read full data, 
not read incremental data of course including backfill data do not read. Don't 
create slots when the job starts, and make sure to delete them when the job 
finishes executing, otherwise you'll have slots left over.
2.2.Streaming patterns
When the job stops in streaming mode, it does not need to delete the slot, 
otherwise the job state will be lost, but the backfill in streaming mode will 
create a child slot, which needs to be deleted, otherwise it will lead to the 
child slot remaining.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to