malinjawi opened a new pull request, #12217: URL: https://github.com/apache/gluten/pull/12217
## What changes This is the next stacked Delta DV MoR slice after #12216. It adds a focused benchmark harness for persistent deletion-vector DELETE so we can measure the current correctness path before enabling native bitmap construction or target-scan shortcuts. Stack order: 1. #12197 - DV scan info extraction utility 2. #12198 - JVM Delta DV scan handoff 3. #12215 - DML row-index scan safety 4. #12216 - persistent DV DELETE correctness path 5. This PR - focused DELETE DV diagnostics benchmark This PR should remain draft until the earlier correctness PR has native CI confidence and we attach runtime benchmark output from CI or a local native build. ## Scope - Adds `DeltaDeleteDeletionVectorBenchmark` for Delta 3.3 and Delta 4.0. - Measures Spark DELETE DV baseline against Gluten DELETE DV with native write and DML row-index scan enabled. - Covers create-DV and update-existing-DV modes. - Validates correctness during the benchmark by checking active files, files with DVs, DV cardinality, and payload bytes. ## Intentionally deferred - Native bitmap aggregation as the default DELETE bitmap construction path. - Plain Parquet target-scan optimization. - Production timing hooks. - Checksum or stats shortcuts. - Any CI performance assertion on noisy speedup numbers. ## Validation Local validation after rebasing onto `origin/split/delta-dv-delete-correctness` at `bbb971f71cd4fc690258c54c59333b392f90a8aa`: - `git diff --check origin/split/delta-dv-delete-correctness...HEAD` - `env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` - `env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn -q test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` Runtime benchmark execution is still pending because this local Mac checkout cannot start the Velox backend without `darwin/aarch64/libgluten.dylib`. The benchmark class is intentionally added as a draft/diagnostic harness so native CI or a compatible local native build can provide the measured output before review-ready status. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
