JNSimba opened a new pull request, #4435:
URL: https://github.com/apache/flink-cdc/pull/4435

   ## What is the purpose of the change
   
   During the snapshot phase, the PostgreSQL connector reads column structure 
via JDBC `DatabaseMetaData#getColumns(catalog, schemaPattern, tableNamePattern, 
columnNamePattern)`. Per the JDBC spec, **both** `schemaPattern` and 
`tableNamePattern` are LIKE patterns, where `_` matches any single character 
and `%` matches any sequence of characters. Both are legal identifier 
characters in PostgreSQL, so `getColumns` can return columns from other 
schemas/tables that were never meant to match.
   
   An exact filter on the **table name** already exists, but the **schema name 
was never validated**. When two schemas have names that are wildcard matches of 
each other (e.g. `sch_test` and `schxtest`, where `_` matches `x`) and both 
contain a same-named table, capturing `sch_test.<table>` also pulls in the 
look-alike schema's columns. The table-name filter cannot tell them apart, so 
the snapshot fails with `IllegalStateException: Duplicate key Optional.empty` 
(or, when columns differ, silently merges columns from the wrong schema).
   
   ## Brief change log
   
   - `PostgresConnection#doReadTableColumn`: also compare the result-set 
`TABLE_SCHEM` (column 2) against `TableId.schema()`, in addition to the 
existing `TABLE_NAME` check. The schema check is skipped when the requested 
schema is `null`, so columns are not dropped when no schema is specified. This 
is a pure after-the-fact filter; it does not change the metadata query and does 
not affect normal (non-wildcard) schemas/tables.
   - `SimilarTableNamesITCase` / `similar_names.sql`: add a cross-schema case 
with two schemas (`sch_test` / `schxtest`) that are wildcard matches of each 
other, each holding a same-named table, verifying that only the target schema's 
snapshot and incremental data are captured.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   - Added 
`SimilarTableNamesITCase#testReadTableWithSimilarSchemaNameUnderscore`, which 
fails (snapshot `IllegalStateException: Duplicate key Optional.empty`) without 
the fix and passes with it.
   
   ## Does this pull request potentially affect one of the following parts:
   
   - Dependencies (does it add or upgrade a dependency): no
   - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
   - The serializers: no
   - The runtime per-record code paths (performance sensitive): no
   - Anything that affects deployment or recovery: no
   - The connector code base: yes (postgres-cdc snapshot column reading)
   
   ## Documentation
   
   - Does this pull request introduce a new feature? no
   - If yes, how is the feature documented? not applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to