JNSimba opened a new pull request, #4453:
URL: https://github.com/apache/flink-cdc/pull/4453

   ## What is the purpose of the change
   
   The Postgres incremental source uses the Debezium connector config's snapshot
   fetch size (`connectorConfig.getSnapshotFetchSize()`, default `10240`) for 
the
   snapshot split read, instead of the Flink CDC option 
`scan.snapshot.fetch.size`
   (default `1024`). As a result:
   
   1. `scan.snapshot.fetch.size` has no effect for the Postgres source.
   2. When a snapshot chunk's row count is `<=` the fetch size, the JDBC
      server-side cursor returns the whole chunk in a single batch, loading 
every
      row of the chunk into memory at once. On wide tables (many columns / large
      rows) this can exhaust the heap and OOM.
   
   The MySQL source already reads `sourceConfig.getFetchSize()` for its snapshot
   read. This PR makes the Postgres snapshot read task do the same, so that
   `scan.snapshot.fetch.size` is honored consistently across connectors.
   
   ## Brief change log
   
   - `PostgresSnapshotSplitReadTask` now takes the `PostgresSourceConfig` and 
uses
     `sourceConfig.getFetchSize()` for the snapshot select statement, instead of
     `connectorConfig.getSnapshotFetchSize()`.
   
   ## Verifying this change
   
   This change is covered by existing Postgres source tests; the snapshot fetch
   size now follows the `scan.snapshot.fetch.size` option.
   
   ## Does this pull request potentially affect one of the following parts:
   
   - Dependencies (does it add or upgrade a dependency): no
   - The public API: no
   - The serializers: no
   - The runtime per-record code paths (performance sensitive): no
   - Anything that affects deployment or recovery: no
   - The S3 file system connector: no
   
   ## Documentation
   
   - Does this pull request introduce a new feature? no
   - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to