malinjawi opened a new issue, #11901: URL: https://github.com/apache/gluten/issues/11901
### Description This issue tracks the native execution gaps in Delta Lake MoR (Merge-on-Read) read and DML paths on the Velox backend, focusing on deletion-vector semantics, fallback reduction, and performance. The goal is to move Delta MoR in Gluten from partial prototype status to stable native execution. This tracker is intended to organize the work and align with [Delta Lake’s MoR design](https://docs.google.com/document/d/1lv35ZPfioopBbzQ7zT82LOev7qV7x4YNLkMr2-L5E_M/edit?tab=t.0#heading=h.z89r7ifgftsi), deletion vector protocol, and Gluten’s lakehouse integration. Related issues: - [VL] Unified design for data lake read support in Gluten + Velox #3378 - [VL] Delta read enhancement #10377 - [VL] Delta write support #10215 Related context: - Gluten 2026 Roadmap #11827 ### Current status Current PoC work has already made progress in the following areas: - Delta DV/MoR read foundation has been prototyped in Gluten + Velox. - Native DV scan/read path is partially working. - Fallback still exists in some control-plane and non-hot-path operations. - Native DELETE path has been explored, but MoR DML is not yet complete. - UPDATE / MERGE are not yet fully native. ### Scope This issue tracks Delta MoR work in the Velox backend, including: - native reads for Delta tables with deletion vectors - native DML paths that generate or update deletion vectors - protocol correctness for DV descriptors and action handling - performance and fallback reduction for MoR workloads Out of scope: - generic Delta CoW improvements unless directly required by MoR - non-Velox backend work - unrelated lakehouse features ### Priority #### P0 - [ ] Native DV read correctness - [ ] Native MoR read execution with minimal/zero fallback on supported queries - [ ] Stable build/runtime validation in clean environments #### P1 - [ ] Native DELETE support for MoR - [ ] Correct handling of files with existing deletion vectors - [ ] Reduction of control-plane overhead and fallback in MoR workloads (e.g. Delta helper queries, JSON/log handling, histogram aggregation) #### P2 - [ ] Native UPDATE support for MoR - [ ] Native MERGE support for MoR - [ ] Broader MoR performance optimization and workload coverage ### Work areas #### 1. MoR read path - [ ] Complete native DV scan/read support - [ ] Integrate Delta MoR reads cleanly into Gluten + Velox planning/execution - [ ] Reduce fallback on supported MoR read queries #### 2. MoR write path - [ ] Add native DELETE support - [ ] Add native UPDATE support - [ ] Add native MERGE support - [ ] Support rewriting/replacing existing DV states correctly #### 3. Delta protocol alignment - [ ] Align implementation with Delta deletion vector protocol semantics - [ ] Ensure correct handling of `u` / `p` / `i` DV descriptors - [ ] Ensure correct handling of offsets, size, checksum, and cardinality - [ ] Ensure correct reconciliation behavior for `(path, deletionVector.uniqueId)` #### 4. Performance - [ ] Improve MoR read performance - [ ] Improve MoR write performance - [ ] Benchmark Gluten/Velox against vanilla Spark for representative MoR workloads - [ ] Achieve competitive or improved performance over vanilla Spark in representative MoR workloads #### 5. Testing and validation - [ ] Add unit and integration coverage for MoR read/write paths - [ ] Add regression coverage for DV protocol edge cases - [ ] Validate correctness across partitioned and non-partitioned tables ### Success criteria - Supported Delta MoR read queries execute natively with zero or near-zero fallback - Native DV read results match vanilla Spark / Delta semantics - DELETE path is stable and mostly native - Gluten/Velox MoR performance is competitive or improved over vanilla Spark on representative workloads ### Gluten version main branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
