[PR] feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait [mahout]

via GitHub Mon, 11 May 2026 19:54:15 -0700


rich7420 opened a new pull request, #1310:
URL: https://github.com/apache/mahout/pull/1310


   ### Related Issues
   
   <!-- Closes #123 -->
   
   ### Changes
   
   - [ ] Bug fix
   - [ ] New feature
   - [x] Refactoring
   - [ ] Documentation
   - [x] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Why
   
   The single-sample f32 zero-copy entry point currently exists as a standalone 
`pub unsafe fn` on each encoder type (`AmplitudeEncoder` / `AngleEncoder` / 
`BasisEncoder`), while the batch sibling (`encode_batch_from_gpu_ptr_f32`) 
lives on the `QuantumEncoder` trait with a default `NotImplemented` body.
   
   That asymmetry has a real cost: adding a new encoder with f32 zero-copy 
support today requires the author to remember three separate touch points (the 
inherent fn on the encoder, the dispatcher in `engine.rs`, and the engine-level 
entry method). The batch-side already pays for the trait override discipline; 
the single-sample side should too.
   
   This refactor makes the trait surface symmetric. No new functionality, no 
behaviour change, no kernel changes.
   
   ### How
   
   **Trait change** (`qdp-core/src/gpu/encodings/mod.rs`)
   
   Add `encode_from_gpu_ptr_f32` to `QuantumEncoder` with a default body 
returning `MahoutError::NotImplemented`. Signature mirrors the existing 
`encode_batch_from_gpu_ptr_f32` exactly. Cfg-gated to Linux / CUDA, same as the 
batch sibling.
   
   **Encoder overrides**
   
   - `AmplitudeEncoder`, `AngleEncoder`, `BasisEncoder` each gain a trait 
`impl` of `encode_from_gpu_ptr_f32` that delegates to the existing inherent 
`_with_stream` workhorse fn. Hot paths can keep calling the inherent fn 
directly without a vtable.
   - Encoders that don't have an f32 zero-copy path yet (Phase, IQP, IQP-Z) 
inherit the default `NotImplemented` body — the same pattern used by 
`encode_batch_from_gpu_ptr_f32` today.
   
   **Amplitude alignment** (`qdp-core/src/gpu/encodings/amplitude.rs`)
   
   `AmplitudeEncoder` previously had no standalone 
`encode_from_gpu_ptr_f32_with_stream` — the L2-norm + kernel-launch sequence 
was inlined in `QdpEngine::encode_from_gpu_ptr_f32_with_stream` (`lib.rs`). Add 
the missing inherent fn to mirror the angle / basis pattern, then have 
`QdpEngine` delegate to it (−62 LoC inline, +0 net behaviour).
   
   **Tests** (`qdp-core/tests/gpu_ptr_encoding.rs`)
   
   Four new trait-dispatch tests dispatch through `Encoding::encoder()` (the 
trait object), not the inherent helper:
   
   - `test_trait_encode_from_gpu_ptr_f32_amplitude`
   - `test_trait_encode_from_gpu_ptr_f32_angle`
   - `test_trait_encode_from_gpu_ptr_f32_basis`
   - `test_trait_encode_from_gpu_ptr_f32_default_not_implemented_for_phase`
   
   The last one verifies the default body returns a `NotImplemented` error with 
the expected message — the contract used by callers that fall back to a CPU 
staging path when an encoder doesn't support zero-copy.
   
   ### Test results (local, RTX 2080 Ti)
   
   - Rust unit + integration: all passing
   - GPU-required tests (`gpu_ptr_encoding`, `gpu_fidelity`, 
`gpu_angle_encoding`): all passing (incl. the 4 new ones)
   - Python pytest (`qdp/qdp-python/tests`): all passing
   - Clippy / fmt / license-header: clean
   
   ### Baseline benchmark (no regression)
   
   16 qubits, batch_size = 64, 200 batches, prefetch = 16, RTX 2080 Ti, 
sustained throughput:
   
   | Encoding  | Median (vec/s) | p95 (vec/s) |
   |-----------|---------------:|------------:|
   | amplitude |         14,316 |      14,783 |
   | angle     |        204,209 |     207,091 |
   | basis     |        648,118 |     692,364 |
   
   Amplitude meets the existing baseline gate (≥ 14,015 vec/s); angle / basis 
match or slightly improve over the pre-refactor numbers (the refactor only 
changes call structure, not the kernel path).
   
   ### Migration / compatibility
   
   No public API removed or renamed. The new trait method adds a new override 
point that future encoders should implement when they gain f32 zero-copy 
support; existing encoders without such a path keep their current 
`NotImplemented` behaviour.
   
   Downstream callers that were reaching 
`AmplitudeEncoder::encode_from_gpu_ptr_f32_with_stream` (there were none 
in-tree) gain a new public inherent fn matching the angle / basis layout.
   
   ### Checklist
   
   - [x] Added or updated unit tests for all changes
   - [x] Added or updated documentation for all changes (doc-comments on trait 
method + new amplitude fn explain the symmetry intent)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] feat(qdp): hoist encode_from_gpu_ptr_f32 onto QuantumEncoder trait [mahout]

Reply via email to