malinjawi opened a new pull request, #12214: URL: https://github.com/apache/gluten/pull/12214
What changes are proposed in this pull request? This PR is the next split for Delta deletion-vector MoR support. It adds the native bitmap primitive needed by later DELETE DV work, without changing DELETE routing or enabling native bitmap construction in the command path yet. Main changes: - extend `RoaringBitmapArray` for Delta Portable-format deletion-vector payloads - add bounded deserialization using CRoaring portable deserialize sizing before `readSafe` - add native `bitmapaggregator` support for Delta row-index aggregation - wire the aggregate name through Gluten expression/substrait planning - add focused native tests for bitmap serialization/deserialization and aggregate behavior - add `delta_bitmap_benchmark` with construction, partial-merge, and deserialize/probe cases This PR is intentionally primitive-only: - no DELETE command routing changes - no DML row-index scan planning changes - no plain Parquet target scan optimization - no native bitmap aggregation enabled as the default DELETE path Those pieces remain in follow-up split PRs after the primitive and benchmark shape are reviewed. How was this patch tested? Post-rebase validation on top of current `upstream/main` (`33be6fb8bf703ac16eae3c75efa919a97d9cdf5a`): - `git diff --check upstream/main...HEAD` - `env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` - `env JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` Focused standalone native validation from the same diff before the final rebase: - standalone `RoaringBitmapArrayTest`: passed all 9 focused tests - Delta JVM compatibility: JVM-generated sparse-gap portable fixture for values `1`, `7`, and `1 << 33` is read by native code; native compact portable payload for the same values is read by a Delta 3.3.2 JVM helper with cardinality `3`, all expected contains checks, and last value `8589934592` - standalone `delta_bitmap_benchmark` construction/merge output: `/tmp/delta_bitmap_benchmark_delete_construction.json` - standalone `delta_bitmap_benchmark` read/probe output: `/tmp/delta_bitmap_benchmark_read_probe.json` Benchmark highlights from the standalone run: - contiguous 1M build+serialize: `7.91 ms`, `132.5M rows/s` - sparse 1M build+serialize: `9.99 ms`, `105.0M rows/s` - clustered 1M build+serialize: `10.10 ms`, `103.9M rows/s` - multi-bucket 256K build+serialize: `2.28 ms`, `114.9M rows/s` - sparse 1M merge from 64 partials: `1.12 ms` - contiguous round-robin merge from 64 partials: `1.32 ms` - sparse deserialize+probe: `487 us` for an 8,192-probe sample Notes: - Normal local Gluten C++ target validation is currently blocked by local Velox/build-tree setup issues, so this draft PR is opened to get the regular native CI signal. - `clang-format` was not available on this local machine after the final rebase; C++ format CI should validate formatting. Was this patch authored or co-authored using generative AI tooling? Generated-by: IBM BOB -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
