[ 
https://issues.apache.org/jira/browse/FLINK-40007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Di Wu updated FLINK-40007:
--------------------------
    Description: 
The Postgres incremental source uses the Debezium connector config's snapshot

  fetch size (connectorConfig.getSnapshotFetchSize(), default 10240) for the

  snapshot split read, instead of the Flink CDC option scan.snapshot.fetch.size

  (default 1024). As a result:

  

  1. scan.snapshot.fetch.size has no effect for the Postgres source.

  2. When a snapshot chunk's row count is <= the fetch size, the JDBC 
server-side

     cursor returns the whole chunk in a single batch, loading every row of the

     chunk into memory at once. On wide tables (many columns / large rows) this

     can exhaust the heap and OOM.

     

  The MySQL source already uses sourceConfig.getFetchSize() for its snapshot 
read.

  The Postgres snapshot read task should do the same so that 
scan.snapshot.fetch.size

  is honored consistently across connectors.

 

Releate pr: https://github.com/apache/flink-cdc/pull/2766/changes

  was:
The Postgres incremental source uses the Debezium connector config's snapshot

  fetch size (connectorConfig.getSnapshotFetchSize(), default 10240) for the

  snapshot split read, instead of the Flink CDC option scan.snapshot.fetch.size

  (default 1024). As a result: 

  

  1. scan.snapshot.fetch.size has no effect for the Postgres source.

  2. When a snapshot chunk's row count is <= the fetch size, the JDBC 
server-side

     cursor returns the whole chunk in a single batch, loading every row of the

     chunk into memory at once. On wide tables (many columns / large rows) this

     can exhaust the heap and OOM. 

     

  The MySQL source already uses sourceConfig.getFetchSize() for its snapshot 
read.

  The Postgres snapshot read task should do the same so that 
scan.snapshot.fetch.size

  is honored consistently across connectors.


> Postgres CDC ignores scan.snapshot.fetch.size during snapshot, causing OOM on 
> wide tables
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-40007
>                 URL: https://issues.apache.org/jira/browse/FLINK-40007
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.6.0
>            Reporter: Di Wu
>            Priority: Major
>
> The Postgres incremental source uses the Debezium connector config's snapshot
>   fetch size (connectorConfig.getSnapshotFetchSize(), default 10240) for the
>   snapshot split read, instead of the Flink CDC option 
> scan.snapshot.fetch.size
>   (default 1024). As a result:
>   
>   1. scan.snapshot.fetch.size has no effect for the Postgres source.
>   2. When a snapshot chunk's row count is <= the fetch size, the JDBC 
> server-side
>      cursor returns the whole chunk in a single batch, loading every row of 
> the
>      chunk into memory at once. On wide tables (many columns / large rows) 
> this
>      can exhaust the heap and OOM.
>      
>   The MySQL source already uses sourceConfig.getFetchSize() for its snapshot 
> read.
>   The Postgres snapshot read task should do the same so that 
> scan.snapshot.fetch.size
>   is honored consistently across connectors.
>  
> Releate pr: https://github.com/apache/flink-cdc/pull/2766/changes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to