malinjawi opened a new pull request, #12217:
URL: https://github.com/apache/gluten/pull/12217

   ## What changes
   
   This is the next stacked Delta DV MoR slice after #12216. It adds a focused 
benchmark harness for persistent deletion-vector DELETE so we can measure the 
current correctness path before enabling native bitmap construction or 
target-scan shortcuts.
   
   Stack order:
   
   1. #12197 - DV scan info extraction utility
   2. #12198 - JVM Delta DV scan handoff
   3. #12215 - DML row-index scan safety
   4. #12216 - persistent DV DELETE correctness path
   5. This PR - focused DELETE DV diagnostics benchmark
   
   This PR should remain draft until the earlier correctness PR has native CI 
confidence and we attach runtime benchmark output from CI or a local native 
build.
   
   ## Scope
   
   - Adds `DeltaDeleteDeletionVectorBenchmark` for Delta 3.3 and Delta 4.0.
   - Measures Spark DELETE DV baseline against Gluten DELETE DV with native 
write and DML row-index scan enabled.
   - Covers create-DV and update-existing-DV modes.
   - Validates correctness during the benchmark by checking active files, files 
with DVs, DV cardinality, and payload bytes.
   
   ## Intentionally deferred
   
   - Native bitmap aggregation as the default DELETE bitmap construction path.
   - Plain Parquet target-scan optimization.
   - Production timing hooks.
   - Checksum or stats shortcuts.
   - Any CI performance assertion on noisy speedup numbers.
   
   ## Validation
   
   Local validation after rebasing onto 
`origin/split/delta-dv-delete-correctness` at 
`bbb971f71cd4fc690258c54c59333b392f90a8aa`:
   
   - `git diff --check origin/split/delta-dv-delete-correctness...HEAD`
   - `env 
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl 
backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `env 
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl 
backends-velox -am 
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   
   Runtime benchmark execution is still pending because this local Mac checkout 
cannot start the Velox backend without `darwin/aarch64/libgluten.dylib`. The 
benchmark class is intentionally added as a draft/diagnostic harness so native 
CI or a compatible local native build can provide the measured output before 
review-ready status.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to