[
https://issues.apache.org/jira/browse/FLINK-39169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061660#comment-18061660
]
Luca Occhipinti edited comment on FLINK-39169 at 3/3/26 9:02 AM:
-----------------------------------------------------------------
Slowing down snapshot reads or using rate limiting can help reduce load on the
writer, but in Aurora/RDS the snapshot still hits the writer instance.
The behavior I’m proposing would be optional and Aurora/RDS-specific, so it
wouldn’t affect other MySQL deployments.
This would complement rate limiting rather than replace it, providing a safer
and more efficient way to handle large snapshots in these environments.
Happy to share a draft implementation of this if helpful.
was (Author: JIRAUSER310333):
Thanks for the clarification. Slowing down snapshot reads or using rate
limiting can help reduce load on the writer, but in Aurora/RDS the snapshot
still hits the writer instance.
The behavior I’m proposing — offloading snapshot reads to reader replicas —
would be optional and Aurora/RDS-specific, so it wouldn’t affect other MySQL
deployments.
This would complement rate limiting rather than replace it, providing a safer
and more efficient way to handle large snapshots in these environments.
Happy to share a draft implementation of this if helpful.
> [mysql-connector] Use reader instances to run snapshots
> -------------------------------------------------------
>
> Key: FLINK-39169
> URL: https://issues.apache.org/jira/browse/FLINK-39169
> Project: Flink
> Issue Type: Improvement
> Components: Flink CDC
> Reporter: Luca Occhipinti
> Priority: Major
> Labels: mysql-cdc-connector
>
> When running MySQL CDC in snapshot or initial mode (both streaming and batch)
> In cloud environments like AWS Aurora/RDS, the connector requires to be in
> the primary/writer database instance to retrieve the binlog position and then
> continues running snapshot queries.
> This creates unnecessary load on the primary/writer instance when performing
> large snapshot reads, which can impact production workloads.
> Usually this there are read replicas specifically designed to offload read
> traffic.
> However, the current implementation cannot leverage these replicas for
> snapshot data reading.
> The proposal is to use writer instance to get binlog position, use the reader
> replica to run the snapshot queries, and if running in streaming mode, keep
> using the writer to track binlog changes
--
This message was sent by Atlassian Jira
(v8.20.10#820010)