AntiO2 opened a new issue, #16698:
URL: https://github.com/apache/iceberg/issues/16698
### Feature Request / Improvement
### Feature request
Support Spark SQL DDL syntax for Iceberg V3 column default values.
Iceberg V3 supports column default values in table metadata, and Spark can
already consume existing Iceberg defaults when writing rows with the `DEFAULT`
keyword. However, Spark SQL DDL currently cannot create or evolve Iceberg
column defaults.
The requested syntax includes:
```sql
CREATE TABLE local.db.t (
id INT,
data STRING DEFAULT 'default-value'
)
USING iceberg
TBLPROPERTIES ('format-version' = '3');
```
```sql
ALTER TABLE local.db.t
ADD COLUMN data STRING DEFAULT 'default-value';
```
```sql
ALTER TABLE local.db.t
ALTER COLUMN data SET DEFAULT 'new-value';
```
### Current behavior
Using the latest Iceberg `main` with Spark 4.1.2, the following operations
fail:
```sql
CREATE TABLE local.db.t_create (
id INT,
data STRING DEFAULT 'default-value'
)
USING iceberg
TBLPROPERTIES ('format-version' = '3');
```
fails with:
```text
[UNSUPPORTED_FEATURE.TABLE_OPERATION] Table `local`.`db`.`t_create` does not
support column default value.
```
```sql
ALTER TABLE local.db.t_alter
ADD COLUMN data STRING DEFAULT 'default-value';
```
fails with:
```text
Cannot add column data since setting default values in Spark is currently
unsupported
```
```sql
ALTER TABLE local.db.t_modify
ALTER COLUMN data SET DEFAULT 'new-value';
```
fails with:
```text
Cannot apply unknown table change:
org.apache.spark.sql.connector.catalog.TableChange$UpdateColumnDefaultValue
```
### Expected behavior
For Iceberg V3 tables, Spark SQL DDL should translate Spark column default
table changes into Iceberg schema updates.
Expected behavior:
* `CREATE TABLE ... column DEFAULT ...` should create Iceberg columns with
write defaults.
* `ALTER TABLE ... ADD COLUMN ... DEFAULT ...` should add the column with an
Iceberg write default.
* `ALTER TABLE ... ALTER COLUMN ... SET DEFAULT ...` should update the
Iceberg write default for the column.
* Unsupported cases, such as defaults on table format versions that do not
support them, should fail with clear validation errors.
### Motivation
Column defaults are useful for schema evolution and operational
compatibility. Users often manage tables through SQL rather than Java APIs.
Without Spark SQL DDL support, users cannot create or update Iceberg V3
defaults from Spark even though the table
format and Iceberg API support the concept.
This creates an inconsistent experience:
* Iceberg table metadata can contain defaults.
* Spark SQL can write `DEFAULT` values when defaults already exist.
* Spark SQL cannot define or evolve those defaults through DDL.
Supporting this syntax would make Spark SQL a complete interface for
managing Iceberg V3 column defaults.
### Related issue
This is related to the broader default values request:
* https://github.com/apache/iceberg/issues/10761
That issue was a general request for default values support. This issue is
specifically about Spark SQL DDL support for Iceberg V3 column default metadata.
### Query engine
Spark
### Willingness to contribute
- [x] I can contribute this improvement/feature independently
- [ ] I would be willing to contribute this improvement/feature with
guidance from the Iceberg community
- [ ] I cannot contribute this improvement/feature at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]