viiccwen opened a new issue, #1107:
URL: https://github.com/apache/mahout/issues/1107

   ### What
   
   The batch amplitude encoding and batched L2 norm CUDA kernels assume each 
sample base is aligned for `double2` / `float2` vector loads. That assumption 
does not hold when `sample_len` is odd and `sample_idx > 0`.
   
   In those cases:
   
   - `input_batch + sample_idx * sample_len` is only naturally aligned to the 
scalar type
   - reinterpreting that address as `double2*` or `float2*` can produce 
misaligned accesses
   - CUDA may surface this as `CUDA_ERROR_MISALIGNED_ADDRESS`
   
   ### Affected kernels
   
   - `amplitude_encode_batch_kernel`
   - `l2_norm_batch_kernel`
   - `l2_norm_batch_kernel_f32`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to