malinjawi opened a new pull request, #12040:
URL: https://github.com/apache/gluten/pull/12040
What changes are proposed in this pull request?
This PR is the second step in the split Delta deletion-vector (DV) stack,
following #12001.
It adds the native Velox-side Delta DV reader layer that consumes the
roaring bitmap payload facilities introduced by #12001, without adding the
JVM-side Delta scan metadata handoff yet.
Main changes:
- add a native Delta connector and data source backed by the Hive
connector/data source infrastructure
- register a scoped Delta connector alongside the existing scoped Hive
connector for each Velox runtime
- add Delta split metadata types for:
- deletion-vector descriptors
- protocol metadata
- file statistics used for DV validation
- serialized split payload buffer views
- add `DeltaDeletionVectorReader` to load materialized Delta DV payloads
using `RoaringBitmapArray`
- add `DeltaSplitReader` to validate DV protocol/statistics metadata and
apply row-index filtering semantics
- add focused native unit coverage for connector setup, split metadata, and
deletion-vector reader behavior
This PR is intentionally native-reader only:
- no JVM-side Delta scan metadata handoff yet
- no end-to-end Delta scan offload behavior change yet
Those pieces will be added in follow-up split PRs.
issue #11901.
How was this patch tested?
Added focused native test coverage in:
- `cpp/velox/compute/delta/tests/DeltaConnectorTest.cpp`
- `cpp/velox/compute/delta/tests/DeltaSplitTest.cpp`
- `cpp/velox/compute/delta/tests/DeltaDeletionVectorReaderTest.cpp`
Covered cases:
- Delta connector configuration and connector properties
- split-carried deletion-vector descriptors and logical row-count accounting
- loading materialized DV payloads from `RoaringBitmapArray`
- row deletion checks and keep/drop filter decisions
- empty payload handling and invalid payload rejection
- protocol/statistics validation for DV-bearing splits
Validation run:
- fork preview CI against `malinjawi/incubator-gluten:main` on the combined
PR2 branch: all checks passed after rerunning two infra-flaky jobs
- local `git diff --check upstream/main...HEAD`
- local clang-format pass with `/opt/homebrew/opt/llvm@15/bin/clang-format`
over changed C++ files
Was this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]