[PR] [QDP] Add zero-copy amplitude encoding from float32 GPU tensors [mahout]

via GitHub Sat, 31 Jan 2026 00:12:04 -0800


viiccwen opened a new pull request, #999:
URL: https://github.com/apache/mahout/pull/999


   ### Purpose of PR
   This PR adds `encode_from_gpu_ptr_f32` and 
`encode_from_gpu_ptr_f32_with_stream` to `QdpEngine`, enabling zero-copy 
amplitude encoding from float32 GPU pointers. It relies on the existing 
`GpuStateVector` Float32 support and the `launch_amplitude_encode_f32` / 
`launch_l2_norm_f32` kernels.
   
   ### Changes
   
   #### `qdp/qdp-core/src/lib.rs`
   
   - **`validate_cuda_input_ptr`**: Signature changed from `(device, ptr: 
*const f64)` to `(device, ptr: *const c_void)` so the same helper can validate 
both f64 and f32 pointers. Call sites for `encode_from_gpu_ptr`, 
`encode_from_gpu_ptr_with_stream`, and `encode_batch_from_gpu_ptr_with_stream` 
now pass the raw pointer cast to `*const c_void`.
   - **`encode_from_gpu_ptr_f32`**: New public unsafe function `(input_d: 
*const f32, input_len: usize, num_qubits: usize) -> Result<*mut 
DLManagedTensor>`. Uses the default CUDA stream; delegates to 
`encode_from_gpu_ptr_f32_with_stream` with `stream = null`.
   - **`encode_from_gpu_ptr_f32_with_stream`**: New public unsafe function with 
explicit `stream`. Performs:
     - Input validation (non-empty, `input_len <= state_len`), 
`validate_cuda_input_ptr` on `input_d`.
     - Allocate Float32 state via `GpuStateVector::new(device, num_qubits, 
Precision::Float32)`.
     - Compute inverse L2 norm with 
`AmplitudeEncoder::calculate_inv_norm_gpu_f32_with_stream`.
     - Call `launch_amplitude_encode_f32`, sync stream, then return 
`state_vector.to_dlpack()` when engine precision is Float32; otherwise return 
`NotImplemented` with a clear message.
   - **# Safety**: Doc comments for the new APIs use the same structure as the 
f64 variants (bullet list for input pointer, plus stream requirement for the 
`_with_stream` variant).
   
   #### `qdp/qdp-core/src/gpu/encodings/amplitude.rs`
   
   - **`calculate_inv_norm_gpu_f32`**: Refactored to call 
`calculate_inv_norm_gpu_f32_with_stream(device, input_ptr, len, 
std::ptr::null_mut())`.
   - **`calculate_inv_norm_gpu_f32_with_stream`**: New public unsafe function 
`(device, input_ptr: *const f32, len, stream: *mut c_void) -> Result<f32>`. 
Uses `launch_l2_norm_f32` on the given stream, then `sync_cuda_stream(stream)` 
before copying the norm to host and validating (non-zero, finite).
   
   ### Related Issues or PRs
   closes #996 , also an follow-up PR for #995 
   
   ### Changes Made
   <!-- Please mark one with an "x"   -->
   - [ ] Bug fix
   - [x] New feature
   - [ ] Refactoring
   - [ ] Documentation
   - [x] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Breaking Changes
   <!-- Does this PR introduce a breaking change? -->
   - [ ] Yes
   - [x] No
   
   ### Checklist
   <!-- Please mark each item with an "x" when complete -->
   <!-- If not all items are complete, please open this as a **Draft PR**.
   Once all requirements are met, mark as ready for review. -->
   
   - [x] Added or updated unit tests for all changes
   - [ ] Added or updated documentation for all changes
   - [x] Successfully built and ran all unit tests or manual tests locally
   - [ ] PR title follows "MAHOUT-XXX: Brief Description" format (if related to 
an issue)
   - [x] Code follows ASF guidelines
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [QDP] Add zero-copy amplitude encoding from float32 GPU tensors [mahout]

Reply via email to