rich7420 commented on code in PR #1029:
URL: https://github.com/apache/mahout/pull/1029#discussion_r2793356287
##########
qdp/qdp-core/src/gpu/encodings/amplitude.rs:
##########
@@ -40,8 +40,8 @@ use crate::gpu::memory::{ensure_device_memory_available,
map_allocation_error};
use cudarc::driver::{DevicePtr, DevicePtrMut};
#[cfg(target_os = "linux")]
use qdp_kernels::{
- launch_amplitude_encode, launch_amplitude_encode_batch, launch_l2_norm,
launch_l2_norm_batch,
- launch_l2_norm_f32,
+ launch_amplitude_encode, launch_amplitude_encode_batch,
launch_amplitude_encode_batch_f32,
+ launch_l2_norm, launch_l2_norm_batch, launch_l2_norm_batch_f32,
launch_l2_norm_f32,
Review Comment:
In this file around lines 251–276 and 351–376, amplitude_encode_batch_kernel
/ _f32 compute input_base = sample_idx * input_len and then do
reinterpret_cast<const double2*>(input_batch + input_base) + elem_pair /
float2. For odd input_len and sample_idx > 0 this base pointer is only 8‑byte
(double) / 4‑byte (float) aligned, not 16‑byte, so the double2/float2 loads are
potentially misaligned. This alignment pattern already existed in the original
f64 batch kernel and this PR copies it into the new f32 batch path; please
either enforce even input_len at the Rust call‑site or rework the kernels to
index from a properly aligned double2* / float2* base pointer with a scalar
fallback.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]