jubins opened a new pull request, #4443:
URL: https://github.com/apache/flink-cdc/pull/4443

   ## What is the purpose of the change
   
   Fixes [FLINK-39950](https://issues.apache.org/jira/browse/FLINK-39950) — 
`SqlServerUtils.getSplitColumn()` searches only `table.primaryKeyColumns()` 
when `scan.incremental.snapshot.chunk.key-column` is set, instead of searching 
all table columns. This makes it impossible to use a non-primary-key column as 
the chunk key on SQL Server, even though the feature is documented and works 
correctly in MySQL, Postgres, and Oracle connectors.
   
   As a result, any user who sets `scan.incremental.snapshot.chunk.key-column` 
to a non-PK column gets a misleading `ValidationException: Chunk key column 'X' 
doesn't exist in the primary key [...]`. Additionally, tables without a primary 
key immediately throw before even checking whether a `chunkKeyColumn` was 
provided, making the no-PK + custom chunk key use case entirely broken.
   
   The fix replaces the `primaryKeys.stream()` search with 
`table.columns().stream()` in `SqlServerUtils.getSplitColumn()`, and gates the 
"no primary key" error on `chunkKeyColumn == null`, matching the behavior of 
the Postgres and Oracle connectors.
   
   ## Brief change log
   - Fixed `getSplitColumn()` to search all table columns instead of only 
primary key columns when `chunkKeyColumn` is set
   - Updated the "no primary key" guard to allow tables without a PK when a 
`chunkKeyColumn` is explicitly provided
   - Updated error message from "doesn't exist in the primary key" to "doesn't 
exist in the columns" to accurately reflect what was searched
   - Added `SqlServerUtilsTest` with six unit tests covering all cases: default 
PK, explicit PK column, non-PK column, no-PK table with chunk key, no-PK table 
without chunk key (should throw), and non-existent column (should throw)
   
   ## Verifying this change
   
   This change is covered by new unit tests in `SqlServerUtilsTest`:
   - `testGetSplitColumnDefaultsToPrimaryKey()` — verifies the default behavior 
(no `chunkKeyColumn`) still returns the first PK column
   - `testGetSplitColumnWithPrimaryKeyColumnExplicitlySet()` — verifies a PK 
column can still be explicitly set as the chunk key
   - `testGetSplitColumnWithNonPrimaryKeyColumn()` — the core regression test; 
verifies that a non-PK column (`created_at`) is accepted correctly
   - `testGetSplitColumnWithNonPrimaryKeyColumnByName()` — verifies another 
non-PK column (`name`) works correctly
   - `testGetSplitColumnWithChunkKeyOnTableWithoutPrimaryKey()` — verifies a 
table with no PK can use a chunk key column without throwing
   - `testGetSplitColumnThrowsWhenNoPrimaryKeyAndNoChunkKeyColumn()` — verifies 
the error is still thrown when neither a PK nor a chunk key column is available
   - `testGetSplitColumnThrowsWhenChunkKeyColumnDoesNotExist()` — verifies a 
non-existent column name throws with an accurate error message
   
   ## Does this pull request potentially affect one of the following parts
   - Dependencies (does it add or upgrade a dependency): **no**
   - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
   - The serializers: **no**
   - The runtime per-record code paths (performance sensitive): **no**
   - Anything that affects deployment or recovery (JobManager, Checkpointing, 
Kubernetes/Yarn, ZooKeeper): **no**
   - The S3 file system connector: **no**
   
   ## Documentation
   - Does this pull request introduce a new feature? **no** — it fixes a bug in 
existing functionality
   - If yes, how is the feature documented? not applicable
   
   ## Was generative AI tooling used to co-author this PR?
   - [x] Yes — Claude Code was used as a pair-programming assistant. All code 
was written, understood, and verified by the author.
   Generated-by: Claude Opus 4.8


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to