malinjawi opened a new issue, #12039:
URL: https://github.com/apache/gluten/issues/12039
## Summary
Add UniForm Iceberg support for Delta tables on the Velox backend.
Today Gluten `main` documents `Iceberg readers (UniForm)` as `Not tested` in
`docs/get-started/VeloxDelta.md`, while Delta Lake already defines the
write-side and metadata-generation contract for UniForm Iceberg.
This issue tracks enabling and validating the supported path rather than
only carrying partial hooks.
## Motivation
Delta UniForm allows Delta tables to be read as Iceberg tables by generating
Iceberg metadata asynchronously after Delta commits.
For Gluten, this is a useful interoperability feature because:
- Delta native write support already exists on the Velox path
- Velox already has native Iceberg read/write machinery and Parquet field-id
support
- Gluten already carries partial IcebergCompatV2-related hooks in the Delta
write path
However, Gluten does not currently have end-to-end validation or a support
claim for UniForm Iceberg.
## Current State In Gluten
What already exists on `main`:
- Delta support matrix row: `Iceberg readers (UniForm) | ... | Not tested`
- Delta write-path hooks for `IcebergCompatV2`:
- materialize partition columns into Parquet data files
- tag `AddFile` entries with Iceberg compat version
- force `TIMESTAMP_MICROS`
- set `DeltaParquetWriteSupport` for IcebergCompatV2
- Velox native Parquet writer already supports explicit Parquet field IDs
- Velox native Iceberg code already supports nested field descriptors and
partition specs
What appears missing:
- no `delta-iceberg` dependency/test enablement in Gluten build/test flow
- no Velox end-to-end UniForm test coverage
- no verification that Gluten native Delta write passes nested field IDs
required by `IcebergCompatV2` into the native Parquet writer
- no validation for Delta UniForm restrictions such as active deletion
vectors
- no documentation/support upgrade from `Not tested` to a defined supported
scope
## Delta / Protocol Requirements Relevant Here
Based on Delta UniForm docs and Delta protocol `IcebergCompatV2`
requirements, the supported path needs at least:
- column mapping enabled
- `minReaderVersion >= 2`
- `minWriterVersion >= 7`
- `delta.enableIcebergCompatV2=true`
- `delta.universalFormat.enabledFormats=iceberg`
- Delta 3.1+ writer
- Hive Metastore-backed Iceberg catalog path for the normal read flow
- no active deletion vectors on the UniForm-enabled table
- partition columns materialized in Parquet
- all new `AddFile`s populated with `numRecords`
- timestamp columns written as int64 / micros
- nested array/map field IDs written into Parquet schema
## Proposed Scope
Enable the minimum viable supported path for UniForm Iceberg on Velox Delta
write:
1. Build/runtime enablement
- add the required Delta UniForm Iceberg package/dependency support
(`delta-iceberg`) in the relevant build/test profile
2. Native Delta write plumbing
- verify and, if needed, add plumbing from Delta schema metadata to Velox
Parquet `field_id` assignment for nested array/map fields
- keep Delta’s async UniForm metadata generation path intact after commit
3. Validation / fallback rules
- explicitly validate or fall back for unsupported cases such as:
- active deletion vectors
- conflicting IcebergCompatV1 state
- unsupported upgrade/rewrite paths that need `REORG TABLE ... APPLY
(UPGRADE UNIFORM(...))`
4. End-to-end tests
- create a Delta table with UniForm Iceberg enabled
- write through Gluten native Delta write
- verify Iceberg metadata generation artifacts and status properties
- read the table through Iceberg and validate results
- add negative coverage for restricted cases, especially deletion vectors
5. Docs/support matrix
- update `docs/get-started/VeloxDelta.md` only after the tested scope is
clear
## Likely Implementation Areas
-
`backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenOptimisticTransaction.scala`
-
`backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenDeltaParquetFileFormat.scala`
- native Velox Parquet write plumbing under `cpp/velox/operators/writer` /
writer utils
- Delta/Iceberg test harness under `backends-velox/src-delta33/test` and/or
`src-delta40/test`
## Non-Goals For Initial Work
- UniForm Hudi support
- full historical table rewrite/upgrade coverage beyond the minimum
supported path
- broad support claims for all Delta feature combinations
## Acceptance Criteria
- Gluten can write a UniForm Iceberg-enabled Delta table on the supported
Velox path
- Iceberg metadata is generated and the table can be read through Iceberg
successfully
- the implementation respects Delta `IcebergCompatV2` requirements
- unsupported combinations are explicitly rejected or fall back cleanly
- docs move from `Not tested` to a precise supported statement
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]