kumarpritam863 opened a new pull request, #14584:
URL: https://github.com/apache/iceberg/pull/14584
**Summary**
This PR adds support for default values in Iceberg Kafka Connect, enabling
**automatic extraction** and application of default values from Kafka Connect
schemas **during both auto-table creation and
schema evolution**. Default values are only applied when the target
Iceberg table uses format version 3 or higher, which introduced native support
for column defaults.
**Background**
Iceberg format v3 introduced support for initial and write default values
on columns. When a new column with a default value is added to a table, the
default value is used for:
- **Initial default: Values to read for existing rows that don't have the
column**
- **Write default: Values to write when no value is explicitly provided**
Kafka Connect schemas also support default values through the
defaultValue() method on field schemas. This PR bridges these two systems,
automatically transferring default values from Kafka Connect
to Iceberg tables when schema evolution occurs.
**Behavior**
**Auto-Table Creation**
When Kafka Connect auto-creates a new Iceberg table:
1. If creating a format v3+ table: Default values from Kafka Connect
schemas are extracted and applied
2. If creating a format v2 or v1 table: Default values are ignored (not
supported)
**Schema Evolution on Existing Tables**
When adding new columns to an existing table:
1. If the table is format v3+: Default values are extracted and applied to
new columns
2. If the table is format v2 or v1: Default values are ignored and logged
**Example**
Given a Kafka Connect schema:
Schema schema = SchemaBuilder.struct()
.field("id", Schema.INT32_SCHEMA)
.field("name", SchemaBuilder.string().defaultValue("unknown").build())
.field("age", SchemaBuilder.int32().defaultValue(0).build())
.field("active", SchemaBuilder.bool().defaultValue(true).build())
.build();
For a format v3 table:
- New columns name, age, active will have their default values set
- Existing rows will see "unknown", 0, true for these columns
- New writes without these fields will also use the defaults
For a format v2 table:
- New columns are added without defaults (null/absent for existing rows)
Compatibility
✅ Backward Compatible:
- No breaking API changes
- Format v1/v2 tables continue to work as before (no defaults)
- Only format v3+ tables gain default value support
✅ Forward Compatible:
- Design accommodates future format versions (v4+) automatically
- Check is >= rather than == for format version
✅ Safe Fallback:
- If default value conversion fails, logs a warning and continues without
the default
- Prevents schema evolution failures due to unsupported default value types
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]