Re: [PR] [QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit [mahout]

via GitHub Sun, 01 Feb 2026 09:07:29 -0800


ryankert01 commented on code in PR #968:
URL: https://github.com/apache/mahout/pull/968#discussion_r2751608664



##########
qdp/qdp-kernels/tests/amplitude_encode.rs:
##########
@@ -671,6 +671,75 @@ fn test_l2_norm_batch_kernel_stream() {
     println!("PASS: Batched norm reduction on stream matches CPU");
 }
 
+#[test]
+#[cfg(target_os = "linux")]
+fn test_l2_norm_batch_kernel_grid_limit() {
+    println!("Testing batched L2 norm reduction with grid limit boundary...");
+
+    let device = match CudaDevice::new(0) {
+        Ok(d) => d,
+        Err(_) => {
+            println!("SKIP: No CUDA device available");
+            return;
+        }
+    };
+
+    let sample_len = 4usize;
+    // Limit is queried at runtime (Fermi 65535, CC 3.0+ 2^31-1). Test 
boundary behavior.
+    const AT_FERMI_LIMIT: usize = 65535;
+    const ABOVE_FERMI_LIMIT: usize = 65536;

Review Comment:
   I guess modern gpu (2010+) will not omit error. Did you tested locally?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [QDP] Fix invalid CUDA kernel launch when num_samples exceeds grid dimension limit [mahout]

Reply via email to