J9527H commented on PR #4246: URL: https://github.com/apache/flink-cdc/pull/4246#issuecomment-4841561001
Hi @ThorneANN @yuxiqian, thanks for working on this — we'd really benefit from this feature. We're running a production MySQL CDC pipeline using the DataStream API (`MySqlSource` + custom `DebeziumDeserializationSchema`) on Apache Flink (Flink 2.2 / flink-cdc 3.6.0-2.2). Our job relies on custom parsing, filtering, and DLQ routing logic that's tightly coupled to the DataStream API — migrating to the Pipeline (YAML) connector isn't an option for us without losing that flexibility. Our real-world use case: we periodically need to add new tables to a long-running job, but **we don't need historical/snapshot data for those new tables** — only incremental binlog events going forward. Today our only options are: 1. `scanNewlyAddedTableEnabled(true)` — always triggers a snapshot phase, which doesn't match our requirement, and per #2105 has also been reported to occasionally hang during the snapshot phase. 2. Run a separate, independent job per newly-added table using `StartupOptions.latest()` — works, but doesn't scale operationally. 3. Manually extract the last committed binlog offset from TaskManager checkpoint logs and cold-start with `StartupOptions.specificOffset(...)` — works, but is manual, error-prone, and not officially documented for this use case (adding tables vs. failure recovery). Having `scanBinlogNewlyAddedTableEnabled` on `MySqlSourceBuilder`, consistent with what's already available in `MySqlDataSourceFactory` for the Pipeline connector, would let us avoid all three workarounds above. Happy to test against a patched build if that helps move this forward. Let me know if there's anything I can do to help (testing, providing more context on our use case, etc.). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
