moomindani opened a new issue, #15593:
URL: https://github.com/apache/iceberg/issues/15593
## Motivation
The `create_changelog_view` procedure lets users query changelog rows from
an Iceberg table, but producing a Slowly Changing Dimensions Type-2 (SCD
Type-2) history view still requires users to write non-trivial SQL manually:
joining with snapshot metadata, applying LEAD window functions, and filtering
intermediate rows. This is a recurring pattern across many analytical workloads.
This issue proposes adding a `scd_type2` parameter to
`create_changelog_view` so Iceberg can produce an SCD Type-2 view natively.
## Proposed API
```sql
CALL catalog.system.create_changelog_view(
table => 'db.products',
changelog_view => 'products_history',
identifier_columns => array('product_id'),
scd_type2 => true
)
```
## New columns added to the view
| Column | Type | Nullable | Description |
|---|---|---|---|
| `_valid_from` | TIMESTAMP | no | Commit timestamp when this version became
active |
| `_valid_to` | TIMESTAMP | yes | Commit timestamp when superseded; `NULL`
if still active |
| `_is_current` | BOOLEAN | no | `true` when `_valid_to IS NULL AND
_change_type != 'DELETE'` |
## Design choices
- **`NULL` for open-ended rows** — aligns with dbt
(`dbt_valid_from`/`dbt_valid_to`) and Databricks DLT (`__START_AT`/`__END_AT`)
- **`_valid_from`/`_valid_to` naming** — closest to dbt and Kimball
semantics; prefixed with `_` to match Iceberg metadata column conventions
- **`_is_current` retained** — DELETE rows have `_valid_to IS NULL` but are
NOT current; the convenience flag avoids the two-condition footgun
- **DELETE rows kept** — marks hard deletes with `_is_current = false`,
aligns with dbt `hard_deletes='new_record'`
- **`UPDATE_BEFORE` rows filtered** — they are intermediate computation
artifacts
## Constraints
- `scd_type2 = true` requires `identifier_columns` (explicit or from table
schema identifier fields)
- `scd_type2 = true` and `net_changes = true` are mutually exclusive
- `scd_type2 = true` implicitly forces `compute_updates = true`
## Query engine
Spark
## Willingness to contribute
- [x] I can contribute this improvement/feature independently
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]