CDC pipeline with kafka connect and iceberg sink connector

Rajan Sadasivan via dev Tue, 09 Dec 2025 13:02:53 -0800

hello ICEBERG dev team,

We have been trying to setup a CDC pipeline on kafka connect to push data
to our data lake in AWS with glue as catalog and also schema registry.


There is a topic rename in Mysql debezium connector to a single topic per
tenant.
We did a sample test with a single table with PK column=id.  The schema is
in glue
correctly has identified PK=id.

But when in iceberg s3 tables, the schema does not have the PK identified.
Even with following 2 properties as defined in
https://github.com/apache/iceberg/blob/c4ba60d27b02d8618621ad701e52d51b9a98d0d5/docs/docs/kafka-connect.md

iceberg.tables.default-id-columns=id
OR
iceberg.table.<*table-name*>.id-columns=id

data written to iceberg table is always append mode.   It does NOT do an
upsert for an update.

Could you please let us know if iceberg supports updates and deletes in CDC
pipelines?
Any information on how to setup source and sink ?

Spent a lot of time with AI tools already.

rajans

CDC pipeline with kafka connect and iceberg sink connector

Reply via email to