jubins opened a new pull request, #4443: URL: https://github.com/apache/flink-cdc/pull/4443
## What is the purpose of the change Fixes [FLINK-39950](https://issues.apache.org/jira/browse/FLINK-39950) — `SqlServerUtils.getSplitColumn()` searches only `table.primaryKeyColumns()` when `scan.incremental.snapshot.chunk.key-column` is set, instead of searching all table columns. This makes it impossible to use a non-primary-key column as the chunk key on SQL Server, even though the feature is documented and works correctly in MySQL, Postgres, and Oracle connectors. As a result, any user who sets `scan.incremental.snapshot.chunk.key-column` to a non-PK column gets a misleading `ValidationException: Chunk key column 'X' doesn't exist in the primary key [...]`. Additionally, tables without a primary key immediately throw before even checking whether a `chunkKeyColumn` was provided, making the no-PK + custom chunk key use case entirely broken. The fix replaces the `primaryKeys.stream()` search with `table.columns().stream()` in `SqlServerUtils.getSplitColumn()`, and gates the "no primary key" error on `chunkKeyColumn == null`, matching the behavior of the Postgres and Oracle connectors. ## Brief change log - Fixed `getSplitColumn()` to search all table columns instead of only primary key columns when `chunkKeyColumn` is set - Updated the "no primary key" guard to allow tables without a PK when a `chunkKeyColumn` is explicitly provided - Updated error message from "doesn't exist in the primary key" to "doesn't exist in the columns" to accurately reflect what was searched - Added `SqlServerUtilsTest` with six unit tests covering all cases: default PK, explicit PK column, non-PK column, no-PK table with chunk key, no-PK table without chunk key (should throw), and non-existent column (should throw) ## Verifying this change This change is covered by new unit tests in `SqlServerUtilsTest`: - `testGetSplitColumnDefaultsToPrimaryKey()` — verifies the default behavior (no `chunkKeyColumn`) still returns the first PK column - `testGetSplitColumnWithPrimaryKeyColumnExplicitlySet()` — verifies a PK column can still be explicitly set as the chunk key - `testGetSplitColumnWithNonPrimaryKeyColumn()` — the core regression test; verifies that a non-PK column (`created_at`) is accepted correctly - `testGetSplitColumnWithNonPrimaryKeyColumnByName()` — verifies another non-PK column (`name`) works correctly - `testGetSplitColumnWithChunkKeyOnTableWithoutPrimaryKey()` — verifies a table with no PK can use a chunk key column without throwing - `testGetSplitColumnThrowsWhenNoPrimaryKeyAndNoChunkKeyColumn()` — verifies the error is still thrown when neither a PK nor a chunk key column is available - `testGetSplitColumnThrowsWhenChunkKeyColumnDoesNotExist()` — verifies a non-existent column name throws with an accurate error message ## Does this pull request potentially affect one of the following parts - Dependencies (does it add or upgrade a dependency): **no** - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: **no** - The serializers: **no** - The runtime per-record code paths (performance sensitive): **no** - Anything that affects deployment or recovery (JobManager, Checkpointing, Kubernetes/Yarn, ZooKeeper): **no** - The S3 file system connector: **no** ## Documentation - Does this pull request introduce a new feature? **no** — it fixes a bug in existing functionality - If yes, how is the feature documented? not applicable ## Was generative AI tooling used to co-author this PR? - [x] Yes — Claude Code was used as a pair-programming assistant. All code was written, understood, and verified by the author. Generated-by: Claude Opus 4.8 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
