siddharthteotia opened a new pull request, #18570:
URL: https://github.com/apache/pinot/pull/18570

   ## Summary
   
   Draft / RFC PR for native (Rust + JNI) acceleration of Pinot's query 
execution. Design docs are included in this PR and will be migrated to Google 
Docs for broader community discussion:
   
   - `RUST_REWRITE_DESIGN.md` — strategic doc: goals, gated success metrics, 
phasing, failure modes, engine coverage (SSE/MSE/star-tree/realtime/MV), 
decision log
   - `docs/native/phase-1-design.md` — Phase 1 detailed design: engine 
landscape with verified integration points, JNI interface, kernels, operator 
integration approach, benchmarks, risks, open questions
   
   ## POC scope (Phase 1)
   
   Full agreed-upon scope: **SUM, COUNT, MIN, MAX, DISTINCT_COUNT** on 
**primitive fixed-width types** (INT/LONG/FLOAT/DOUBLE), plus **single-column 
group-by** with a vectorized SwissTable-style hash table and **state-of-the-art 
SIMD kernels** for the aggregations. This PR delivers the first vertical slice 
(SUM(LONG) end-to-end + plumbing); the remaining surface is sub-phased in 
`docs/native/phase-1-design.md` §14.
   
   ## What's in this PR
   
   - New `pinot-native` module — Cargo workspace + Maven module + JNI bindings; 
builds the native library during the normal `./mvnw` flow when Cargo is on PATH
   - `sum_i64_to_f64` kernel with **explicit SIMD intrinsics** and runtime ISA 
dispatch: NEON (aarch64), AVX2 (x86_64), AVX-512DQ (x86_64), scalar fallback
   - `NativeAggregationRouter` + `NativeSumAggregationFunction` in 
`pinot-core`, plugged into `AggregationFunctionFactory` at a single point that 
covers all six aggregation contexts in Pinot (SSE V1, MSE leaf, MSE 
intermediate, star-tree, realtime, MV refresh)
   - TestNG integration tests: factory routing on/off, null-handling guardrail, 
JNI plumbing, native↔Java result equivalence
   
   ## Status
   
   - **Phase 1.A complete** (POC plumbing + first SIMD kernel)
   - Native engine is **opt-in** via `-Dpinot.native.aggregation.enabled=true`. 
Default behavior is unchanged; Java path is the fallback and remains 
authoritative.
   - Library load failures are non-fatal — `PinotNativeAgg.isAvailable()` 
returns false and the factory routes to the Java path. Builds without Cargo 
work via `-DskipNativeBuild=true`.
   
   ## Not yet in this PR (planned follow-ups, per the Phase 1 design doc)
   
   - Phase 1.B: broaden kernels to MIN/MAX × INT/LONG/FLOAT/DOUBLE; add 
`PinotDataBuffer.toNativeAddress()` for zero-copy from the forward index
   - Phase 1.C: integration tests verifying all six aggregation contexts 
actually hit the native path
   - Phase 1.D: SwissTable-based group-by with vectorized batch lookup + 
per-block-all-functions JNI batching
   - Phase 1.E: HLL (clearspring byte-exact parity, kept off the critical path)
   - JMH benchmark vs Java baseline (next step on this branch)
   - x86 hardware verification (current verification is Apple Silicon / NEON 
only)
   
   ## Open for discussion
   
   This PR is intentionally a draft / RFC. Specific things worth community 
input before further investment:
   
   1. Is the strategic framing (cost-per-QPS + tail latency, not "Rust is 
faster") aligned with where the project wants to invest?
   2. Is the factory-level integration point (vs plan-maker fork) the right 
boundary? See `docs/native/phase-1-design.md` §8 and decision log.
   3. Cargo as a build prerequisite — acceptable, or should we lean harder on 
prebuilt binaries shipped in the JAR?
   4. Hardware coverage matrix: which platforms must we support for GA?
   5. Phase 1 gating metrics (`RUST_REWRITE_DESIGN.md` §4.1) — are these the 
right numbers?
   
   ## Test plan
   
   - [x] `cargo test --release` (9 tests pass, including per-backend 
equivalence vs scalar reference)
   - [x] `./mvnw -pl pinot-core -am -Dtest=NativeSumAggregationFunctionTest 
-Dsurefire.failIfNoSpecifiedTests=false test` (5 tests pass: factory routing 
on/off, null-handling guardrail, 100k-element correctness vs Java, INT fallback)
   - [x] Generated assembly verified — explicit NEON pipeline through reduction
   - [ ] JMH benchmark vs Java baseline (next on this branch)
   - [ ] x86 hardware coverage (cloud bench, follow-up)
   - [ ] Broader engine integration coverage (Phase 1.C)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to