malinjawi opened a new pull request, #12215:
URL: https://github.com/apache/gluten/pull/12215

   What changes are proposed in this pull request?
   
   This PR is the next draft split in the Delta deletion-vector MoR stack. It 
is stacked after #12197 and #12198, and is opened early to get CI signal while 
those reviewer-requested scan splits continue through review.
   
   It adds a correctness-first DML row-index scan guard for Delta DV DELETE/MoR 
planning. The goal is to preserve safe fallback behavior for DML target scans 
until native row-index scan execution is proven for the required Delta table 
shapes.
   
   Main changes:
   
   - add `DeltaDeletionVectorDmlUtils` to detect Delta DML row-index scan shapes
   - guard Delta post-transform planning so DML row-index scans do not 
accidentally move under native execution when the native DML path is disabled
   - preserve Delta internal row-index/file-path columns needed by DML target 
scans
   - add focused Delta 3.3 and Delta 4.0 coverage for fallback plan shape
   - add repeated DELETE coverage over an existing deletion vector, verifying 
the active DV cardinality advances and final read results remain correct
   
   This PR is intentionally safety-only:
   
   - no DELETE command routing
   - no native bitmap aggregation enablement
   - no plain Parquet target-scan optimization
   - no performance shortcut for DML scan planning
   
   Benchmark status:
   
   This PR currently preserves Spark fallback for the protected DML row-index 
scan shape, so it is not claiming a native performance win. Before marking this 
ready for review, I plan to attach scan-phase evidence that reports planning 
time, scan execution time, files/rows touched, fallback reason, and C2R cost 
for the guarded path. Native DML row-index scan benchmarking remains a 
follow-up once the native path is enabled behind a separate gate.
   
   Issue: #11901
   
   How was this patch tested?
   
   Validation run locally on 2026-06-01:
   
   - `env 
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl 
backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `env 
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl 
backends-velox -am 
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `git diff --check origin/split/delta-dv-java-scan-handoff-pr-clean...HEAD`
   
   Local focused runtime execution is still blocked by the missing local macOS 
native Gluten library (`darwin/aarch64/libgluten.dylib`), so this draft PR is 
relying on CI for runtime lanes until a compatible local native build is 
available.
   
   Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: IBM BOB
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to