viiccwen opened a new pull request, #916:
URL: https://github.com/apache/mahout/pull/916
### Purpose of PR
This PR implements GPU-accelerated L2 norm calculation for single float32
vectors, which is a foundational component for supporting float32 CUDA tensor
encoding in QDP.
## Changes
### CUDA Kernels (`qdp/qdp-kernels/src/amplitude.cu`)
- Added `warp_reduce_sum_f32()` helper function for efficient warp-level
reduction using shuffle instructions
- Added `block_reduce_sum_f32()` helper function for block-level reduction
using shared memory
- Added `l2_norm_kernel_f32()` kernel that:
- Uses `float2` vectorized loads for 64-bit memory transactions
- Processes 2 elements per thread for optimal memory bandwidth
- Uses warp/block reduction for efficient sum accumulation
- Supports arbitrary input lengths (handles odd lengths correctly)
- Added `finalize_inv_norm_kernel_f32()` kernel to convert accumulated
sum-of-squares to inverse norm using `rsqrtf()`
- Added `launch_l2_norm_f32()` launch function with proper grid size
calculation and error handling
### Rust Bindings (`qdp/qdp-kernels/src/lib.rs`)
- Added `launch_l2_norm_f32` function declaration.
- Added dummy implementation for non-Linux platforms (returns error code 999)
### Tests (`qdp/qdp-kernels/tests/amplitude_encode.rs`)
- Added `test_l2_norm_single_kernel_f32()` test case
- Test verifies correctness against CPU calculation
### Related Issues or PRs
closes #915
### Changes Made
<!-- Please mark one with an "x" -->
- [ ] Bug fix
- [x] New feature
- [ ] Refactoring
- [ ] Documentation
- [x] Test
- [ ] CI/CD pipeline
- [ ] Other
### Breaking Changes
<!-- Does this PR introduce a breaking change? -->
- [ ] Yes
- [x] No
### Checklist
<!-- Please mark each item with an "x" when complete -->
<!-- If not all items are complete, please open this as a **Draft PR**.
Once all requirements are met, mark as ready for review. -->
- [x] Added or updated unit tests for all changes
- [ ] Added or updated documentation for all changes
- [x] Successfully built and ran all unit tests or manual tests locally
- [x] PR title follows "MAHOUT-XXX: Brief Description" format (if related to
an issue)
- [x] Code follows ASF guidelines
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]