viiccwen opened a new pull request, #1108:
URL: https://github.com/apache/mahout/pull/1108

   ### Related Issues
   
   Closes #1107 
   
   ### Changes
   
   - [x] Bug fix
   - [ ] New feature
   - [ ] Refactoring
   - [ ] Documentation
   - [ ] Test
   - [ ] CI/CD pipeline
   - [ ] Other
   
   ### Why
   
   This PR fixes misaligned vector loads in the batch amplitude and batched 
norm CUDA kernels.
   
   When batch samples have an odd length, the base address of later samples is 
not guaranteed to be aligned for `double2` / `float2` loads. The existing 
kernels could therefore trigger misaligned memory accesses and surface 
`CUDA_ERROR_MISALIGNED_ADDRESS`.
   
   ### How
   
   - Updated `amplitude_encode_batch_kernel` to use vectorized `double2` loads 
only when the sample base is aligned
   - Added scalar fallback for misaligned sample bases and odd tails in the 
batch amplitude kernel
   - Updated `l2_norm_batch_kernel` with the same alignment-aware load logic
   - Updated `l2_norm_batch_kernel_f32` with the same alignment-aware load logic
   - Refreshed kernel comments to reflect the new aligned fast path plus scalar 
fallback behavior
   
   ## Tests
   
   - Added a regression test for odd-length batched amplitude encoding
   - Added a regression test for odd-length batched L2 norm reduction (f64)
   - Added a regression test for odd-length batched L2 norm reduction (f32)
   
   ## Checklist
   
   - [x] Added or updated unit tests for all changes
   - [ ] Added or updated documentation for all changes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to