ryankert01 commented on code in PR #941:
URL: https://github.com/apache/mahout/pull/941#discussion_r2725294517
##########
qdp/qdp-kernels/src/amplitude.cu:
##########
@@ -371,17 +371,32 @@ __global__ void l2_norm_batch_kernel(
double local_sum = 0.0;
+ // Alignment peel for double2 (16B) loads
Review Comment:
the comments should include more details
```suggestion
// Alignment peel for double2 (16B) loads
// 1. **Alignment check**: Check if `base` is even (16-byte aligned for
`double2`)
// 2. **Peel first element**: If misaligned, handle the first element
separately using a single `double` load
// 3. **Vectorized loads**: Start vectorized `double2` loads from the
second element (`base + 1`), which is guaranteed to be 16-byte aligned
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]