JeonDaehong opened a new issue, #15584: URL: https://github.com/apache/iceberg/issues/15584
### Apache Iceberg version 1.6.1 ### Query engine Kafka Connect ### Please describe the bug 🐞 ## Summary When configuring a single Iceberg Kafka Connect Sink connector to route **multiple topics to multiple tables** (without `iceberg.tables.route-field`), records from all topics get written to **all** tables if the target tables share the same `id-columns` name and nearly identical schemas. --- ## Environment | | | |---|---| | Connector | Iceberg Sink Connector (Tabular) | | Catalog | AWS Glue | | Storage | Amazon S3 | | Source | Debezium MySQL CDC (Amazon MSK) | | `tasks.max` | `1` | | `iceberg.tables.route-field` | not set | --- ## Steps to Reproduce ### ✅ Works — different `id-columns` & different schemas ```json "topics": "db.schema_a.table_a,db.schema_a.table_b", "iceberg.tables": "catalog.table_a,catalog.table_b", "iceberg.table.catalog.table_a.id-columns": "fund_balance_id", "iceberg.table.catalog.table_b.id-columns": "loan_application_id" ``` → Each topic's records are correctly routed to their respective Iceberg tables. ✅ --- ### ❌ Fails — same `id-columns` name & nearly identical schemas ```json "topics": "db.schema_b.table_c,db.schema_b.table_d", "iceberg.tables": "catalog.table_c,catalog.table_d", "iceberg.table.catalog.table_c.id-columns": "id", "iceberg.table.catalog.table_d.id-columns": "id" ``` → All records from **both topics** get written to **both tables**. ❌ --- ## Expected Behavior Records from `db.schema_b.table_c` should be written **only** to `catalog.table_c`, and records from `db.schema_b.table_d` should be written **only** to `catalog.table_d` — based on topic-to-table name matching (last segment). ## Actual Behavior - Both tables end up with **identical row counts** - Records from the wrong topic appear in each table with **NULL columns** - The connector appears to broadcast records to all registered tables instead of routing by topic --- ## Analysis The two cases differ in the following ways: - **`id-columns`**: different per table (`fund_balance_id` / `loan_application_id`) vs. same (`id` / `id`) - **Table schemas**: completely different structures vs. nearly identical (both have `id`, `modified_datetime`, etc.) - **Topic suffix → table name matching**: looks correct in both cases - **`iceberg.control.topic`**: unique per connector in both cases **Hypothesis:** The routing logic may use schema-based matching (e.g. field name overlap) as a fallback or in conjunction with topic name matching. When two tables share the same `id-columns` name and overlapping field names, the connector may fail to discriminate records by topic and instead broadcasts to all candidate tables. --- ## Workaround Splitting into **1 connector per topic/table** resolves the issue immediately: ```json // Connector A "topics": "db.schema_b.table_c", "iceberg.tables": "catalog.table_c", "iceberg.table.catalog.table_c.id-columns": "id" // Connector B "topics": "db.schema_b.table_d", "iceberg.tables": "catalog.table_d", "iceberg.table.catalog.table_d.id-columns": "id" ``` --- ## Questions / Things to Clarify 1. Is multi-topic → multi-table routing **without `route-field`** officially supported? Or is 1 connector per table the intended pattern? 2. Could having the **same `id-columns` name** (`id`) across multiple tables cause the routing to break? 3. Is there a known issue where routing breaks when target tables share **nearly identical schemas**? 4. Is **topic-name-based routing** (matching last topic segment to table name) supported without SMT? --- ## Willingness to Contribute If this is confirmed as a bug, **I'm happy to investigate the root cause and submit a fix.** It would help to know: - Which part of the codebase handles topic → table routing resolution - Whether schema-based matching plays any role in the routing decision - Any existing tests covering multi-topic / multi-table scenarios with identical `id-columns` Happy to discuss further before diving in. Thanks! ### Willingness to contribute - [x] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
