JeonDaehong opened a new issue, #15584:
URL: https://github.com/apache/iceberg/issues/15584

   ### Apache Iceberg version
   
   1.6.1
   
   ### Query engine
   
   Kafka Connect
   
   ### Please describe the bug 🐞
   
   ## Summary
   
   When configuring a single Iceberg Kafka Connect Sink connector to route 
**multiple topics to multiple tables** (without `iceberg.tables.route-field`), 
records from all topics get written to **all** tables if the target tables 
share the same `id-columns` name and nearly identical schemas.
   
   ---
   
   ## Environment
   
   | | |
   |---|---|
   | Connector | Iceberg Sink Connector (Tabular) |
   | Catalog | AWS Glue |
   | Storage | Amazon S3 |
   | Source | Debezium MySQL CDC (Amazon MSK) |
   | `tasks.max` | `1` |
   | `iceberg.tables.route-field` | not set |
   
   ---
   
   ## Steps to Reproduce
   
   ### ✅ Works — different `id-columns` & different schemas
   
   ```json
   "topics": "db.schema_a.table_a,db.schema_a.table_b",
   "iceberg.tables": "catalog.table_a,catalog.table_b",
   "iceberg.table.catalog.table_a.id-columns": "fund_balance_id",
   "iceberg.table.catalog.table_b.id-columns": "loan_application_id"
   ```
   
   → Each topic's records are correctly routed to their respective Iceberg 
tables. ✅
   
   ---
   
   ### ❌ Fails — same `id-columns` name & nearly identical schemas
   
   ```json
   "topics": "db.schema_b.table_c,db.schema_b.table_d",
   "iceberg.tables": "catalog.table_c,catalog.table_d",
   "iceberg.table.catalog.table_c.id-columns": "id",
   "iceberg.table.catalog.table_d.id-columns": "id"
   ```
   
   → All records from **both topics** get written to **both tables**. ❌
   
   ---
   
   ## Expected Behavior
   
   Records from `db.schema_b.table_c` should be written **only** to 
`catalog.table_c`, and records from `db.schema_b.table_d` should be written 
**only** to `catalog.table_d` — based on topic-to-table name matching (last 
segment).
   
   ## Actual Behavior
   
   - Both tables end up with **identical row counts**
   - Records from the wrong topic appear in each table with **NULL columns**
   - The connector appears to broadcast records to all registered tables 
instead of routing by topic
   
   ---
   
   ## Analysis
   
   The two cases differ in the following ways:
   
   - **`id-columns`**: different per table (`fund_balance_id` / 
`loan_application_id`) vs. same (`id` / `id`)
   - **Table schemas**: completely different structures vs. nearly identical 
(both have `id`, `modified_datetime`, etc.)
   - **Topic suffix → table name matching**: looks correct in both cases
   - **`iceberg.control.topic`**: unique per connector in both cases
   
   **Hypothesis:** The routing logic may use schema-based matching (e.g. field 
name overlap) as a fallback or in conjunction with topic name matching. When 
two tables share the same `id-columns` name and overlapping field names, the 
connector may fail to discriminate records by topic and instead broadcasts to 
all candidate tables.
   
   ---
   
   ## Workaround
   
   Splitting into **1 connector per topic/table** resolves the issue 
immediately:
   
   ```json
   // Connector A
   "topics": "db.schema_b.table_c",
   "iceberg.tables": "catalog.table_c",
   "iceberg.table.catalog.table_c.id-columns": "id"
   
   // Connector B
   "topics": "db.schema_b.table_d",
   "iceberg.tables": "catalog.table_d",
   "iceberg.table.catalog.table_d.id-columns": "id"
   ```
   
   ---
   
   ## Questions / Things to Clarify
   
   1. Is multi-topic → multi-table routing **without `route-field`** officially 
supported? Or is 1 connector per table the intended pattern?
   2. Could having the **same `id-columns` name** (`id`) across multiple tables 
cause the routing to break?
   3. Is there a known issue where routing breaks when target tables share 
**nearly identical schemas**?
   4. Is **topic-name-based routing** (matching last topic segment to table 
name) supported without SMT?
   
   ---
   
   ## Willingness to Contribute
   
   If this is confirmed as a bug, **I'm happy to investigate the root cause and 
submit a fix.** It would help to know:
   
   - Which part of the codebase handles topic → table routing resolution
   - Whether schema-based matching plays any role in the routing decision
   - Any existing tests covering multi-topic / multi-table scenarios with 
identical `id-columns`
   
   Happy to discuss further before diving in. Thanks!
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [ ] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to