malinjawi opened a new pull request, #12215: URL: https://github.com/apache/gluten/pull/12215
What changes are proposed in this pull request? This PR is the next draft split in the Delta deletion-vector MoR stack. It is stacked after #12197 and #12198, and is opened early to get CI signal while those reviewer-requested scan splits continue through review. It adds a correctness-first DML row-index scan guard for Delta DV DELETE/MoR planning. The goal is to preserve safe fallback behavior for DML target scans until native row-index scan execution is proven for the required Delta table shapes. Main changes: - add `DeltaDeletionVectorDmlUtils` to detect Delta DML row-index scan shapes - guard Delta post-transform planning so DML row-index scans do not accidentally move under native execution when the native DML path is disabled - preserve Delta internal row-index/file-path columns needed by DML target scans - add focused Delta 3.3 and Delta 4.0 coverage for fallback plan shape - add repeated DELETE coverage over an existing deletion vector, verifying the active DV cardinality advances and final read results remain correct This PR is intentionally safety-only: - no DELETE command routing - no native bitmap aggregation enablement - no plain Parquet target-scan optimization - no performance shortcut for DML scan planning Benchmark status: This PR currently preserves Spark fallback for the protected DML row-index scan shape, so it is not claiming a native performance win. Before marking this ready for review, I plan to attach scan-phase evidence that reports planning time, scan execution time, files/rows touched, fallback reason, and C2R cost for the guarded path. Native DML row-index scan benchmarking remains a follow-up once the native path is enabled behind a separate gate. Issue: #11901 How was this patch tested? Validation run locally on 2026-06-01: - `env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` - `env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` - `git diff --check origin/split/delta-dv-java-scan-handoff-pr-clean...HEAD` Local focused runtime execution is still blocked by the missing local macOS native Gluten library (`darwin/aarch64/libgluten.dylib`), so this draft PR is relying on CI for runtime lanes until a compatible local native build is available. Was this patch authored or co-authored using generative AI tooling? Generated-by: IBM BOB -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
