Cq-study opened a new pull request, #642:
URL: https://github.com/apache/doris-flink-connector/pull/642
# Proposed changes
Issue Number: close #641
## Problem Summary:
In MySQL-to-Doris full-database CDC sync (`mysql-sync-database`), tables
without primary keys were not filtered automatically.
When startup mode is `initial` (incremental snapshot phase), these no-PK
tables can fail split-based snapshot reading and cause repeated Flink job
restarts.
This PR adds a guard in MySQL schema discovery to improve stability by
default.
### What is changed
1. In `MysqlDatabaseSync#getSchemaList`, skip tables that meet all of the
following:
- table has no primary key
- startup mode is `initial` (or empty, treated as initial)
- no chunk key is configured for that table
2. Keep no-PK tables syncable when chunk key is explicitly configured via:
- `scan.incremental.snapshot.chunk.key.column` (table-level mapping)
3. Add warning logs for skipped tables with clear guidance to configure
chunk key.
4. Add fail-fast message when all matched tables are skipped:
- throw explicit exception describing why no table is left for
synchronization.
5. Add unit tests:
- `MysqlDatabaseSyncTest`
- `testSkipTableWithoutPrimaryKeyInInitialSnapshot`
- `testDoNotSkipTableWithoutPrimaryKeyWhenChunkKeyConfigured`
- `testDoNotSkipTableWithoutPrimaryKeyForNonInitialStartup`
### Why
This avoids default job instability in common real-world schemas where some
source tables do not have primary keys, while preserving an explicit opt-in
path (chunk key) for required no-PK tables.
## Checklist(Required)
1. Does it affect the original behavior: Yes (only for no-PK tables in
initial snapshot mode without chunk key; behavior becomes safer by default)
2. Has unit tests been added: Yes
3. Has document been added or modified: No
4. Does it need to update dependencies: No
5. Are there any changes that cannot be rolled back: No
## Further comments
This change is intentionally scoped to MySQL CDC database sync path and does
not alter sink logic or other source connectors.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]