malinjawi opened a new issue, #12039:
URL: https://github.com/apache/gluten/issues/12039

   ## Summary
   
   Add UniForm Iceberg support for Delta tables on the Velox backend.
   
   Today Gluten `main` documents `Iceberg readers (UniForm)` as `Not tested` in 
`docs/get-started/VeloxDelta.md`, while Delta Lake already defines the 
write-side and metadata-generation contract for UniForm Iceberg.
   
   This issue tracks enabling and validating the supported path rather than 
only carrying partial hooks.
   
   ## Motivation
   
   Delta UniForm allows Delta tables to be read as Iceberg tables by generating 
Iceberg metadata asynchronously after Delta commits.
   
   For Gluten, this is a useful interoperability feature because:
   - Delta native write support already exists on the Velox path
   - Velox already has native Iceberg read/write machinery and Parquet field-id 
support
   - Gluten already carries partial IcebergCompatV2-related hooks in the Delta 
write path
   
   However, Gluten does not currently have end-to-end validation or a support 
claim for UniForm Iceberg.
   
   ## Current State In Gluten
   
   What already exists on `main`:
   - Delta support matrix row: `Iceberg readers (UniForm) | ... | Not tested`
   - Delta write-path hooks for `IcebergCompatV2`:
     - materialize partition columns into Parquet data files
     - tag `AddFile` entries with Iceberg compat version
     - force `TIMESTAMP_MICROS`
     - set `DeltaParquetWriteSupport` for IcebergCompatV2
   - Velox native Parquet writer already supports explicit Parquet field IDs
   - Velox native Iceberg code already supports nested field descriptors and 
partition specs
   
   What appears missing:
   - no `delta-iceberg` dependency/test enablement in Gluten build/test flow
   - no Velox end-to-end UniForm test coverage
   - no verification that Gluten native Delta write passes nested field IDs 
required by `IcebergCompatV2` into the native Parquet writer
   - no validation for Delta UniForm restrictions such as active deletion 
vectors
   - no documentation/support upgrade from `Not tested` to a defined supported 
scope
   
   ## Delta / Protocol Requirements Relevant Here
   
   Based on Delta UniForm docs and Delta protocol `IcebergCompatV2` 
requirements, the supported path needs at least:
   - column mapping enabled
   - `minReaderVersion >= 2`
   - `minWriterVersion >= 7`
   - `delta.enableIcebergCompatV2=true`
   - `delta.universalFormat.enabledFormats=iceberg`
   - Delta 3.1+ writer
   - Hive Metastore-backed Iceberg catalog path for the normal read flow
   - no active deletion vectors on the UniForm-enabled table
   - partition columns materialized in Parquet
   - all new `AddFile`s populated with `numRecords`
   - timestamp columns written as int64 / micros
   - nested array/map field IDs written into Parquet schema
   
   ## Proposed Scope
   
   Enable the minimum viable supported path for UniForm Iceberg on Velox Delta 
write:
   
   1. Build/runtime enablement
   - add the required Delta UniForm Iceberg package/dependency support 
(`delta-iceberg`) in the relevant build/test profile
   
   2. Native Delta write plumbing
   - verify and, if needed, add plumbing from Delta schema metadata to Velox 
Parquet `field_id` assignment for nested array/map fields
   - keep Delta’s async UniForm metadata generation path intact after commit
   
   3. Validation / fallback rules
   - explicitly validate or fall back for unsupported cases such as:
     - active deletion vectors
     - conflicting IcebergCompatV1 state
     - unsupported upgrade/rewrite paths that need `REORG TABLE ... APPLY 
(UPGRADE UNIFORM(...))`
   
   4. End-to-end tests
   - create a Delta table with UniForm Iceberg enabled
   - write through Gluten native Delta write
   - verify Iceberg metadata generation artifacts and status properties
   - read the table through Iceberg and validate results
   - add negative coverage for restricted cases, especially deletion vectors
   
   5. Docs/support matrix
   - update `docs/get-started/VeloxDelta.md` only after the tested scope is 
clear
   
   ## Likely Implementation Areas
   
   - 
`backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenOptimisticTransaction.scala`
   - 
`backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenDeltaParquetFileFormat.scala`
   - native Velox Parquet write plumbing under `cpp/velox/operators/writer` / 
writer utils
   - Delta/Iceberg test harness under `backends-velox/src-delta33/test` and/or 
`src-delta40/test`
   
   ## Non-Goals For Initial Work
   
   - UniForm Hudi support
   - full historical table rewrite/upgrade coverage beyond the minimum 
supported path
   - broad support claims for all Delta feature combinations
   
   ## Acceptance Criteria
   
   - Gluten can write a UniForm Iceberg-enabled Delta table on the supported 
Velox path
   - Iceberg metadata is generated and the table can be read through Iceberg 
successfully
   - the implementation respects Delta `IcebergCompatV2` requirements
   - unsupported combinations are explicitly rejected or fall back cleanly
   - docs move from `Not tested` to a precise supported statement
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to