Re: [PR] [QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors [mahout]

via GitHub Wed, 11 Feb 2026 05:41:48 -0800


rich7420 commented on code in PR #1029:
URL: https://github.com/apache/mahout/pull/1029#discussion_r2793356287



##########
qdp/qdp-core/src/gpu/encodings/amplitude.rs:
##########
@@ -40,8 +40,8 @@ use crate::gpu::memory::{ensure_device_memory_available, 
map_allocation_error};
 use cudarc::driver::{DevicePtr, DevicePtrMut};
 #[cfg(target_os = "linux")]
 use qdp_kernels::{
-    launch_amplitude_encode, launch_amplitude_encode_batch, launch_l2_norm, 
launch_l2_norm_batch,
-    launch_l2_norm_f32,
+    launch_amplitude_encode, launch_amplitude_encode_batch, 
launch_amplitude_encode_batch_f32,
+    launch_l2_norm, launch_l2_norm_batch, launch_l2_norm_batch_f32, 
launch_l2_norm_f32,

Review Comment:
   In this file around lines 251–276 and 351–376, amplitude_encode_batch_kernel 
/ _f32 compute input_base = sample_idx * input_len and then do 
reinterpret_cast<const double2*>(input_batch + input_base) + elem_pair / 
float2. For odd input_len and sample_idx > 0 this base pointer is only 8‑byte 
(double) / 4‑byte (float) aligned, not 16‑byte, so the double2/float2 loads are 
potentially misaligned. This alignment pattern already existed in the original 
f64 batch kernel and this PR copies it into the new f32 batch path; please 
either enforce even input_len at the Rust call‑site or rework the kernels to 
index from a properly aligned double2* / float2* base pointer with a scalar 
fallback.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Add zero-copy amplitude batch encoding from float32 GPU tensors [mahout]

Reply via email to