rich7420 commented on code in PR #929:
URL: https://github.com/apache/mahout/pull/929#discussion_r2724290189
##########
qdp/qdp-python/src/lib.rs:
##########
@@ -65,6 +64,14 @@ impl QuantumTensor {
return Err(PyRuntimeError::new_err("Invalid DLPack tensor
pointer"));
}
+ if let Some(stream) = stream {
+ if stream > 0 {
+ qdp_core::dlpack::synchronize_stream(stream as *mut
std::ffi::c_void).map_err(
+ |e| PyRuntimeError::new_err(format!("CUDA stream sync
failed: {}", e)),
+ )?;
+ }
+ }
+
Review Comment:
DLPack stream handling treats special CUDA stream values 1 (legacy default)
and 2 (per-thread default) as raw pointers, which can lead to invalid
cudaStream_t usage and undefined behavior. The array API spec explicitly
defines these values as non-pointer sentinel values for CUDA streams. The
current logic calls synchronize_stream for any stream > 0, passing 1/2
directly. I think this should be mapped to cudaStreamLegacy/cudaStreamPerThread
(or avoided) per spec.
ref:
https://data-apis.org/array-api/latest/API_specification/generated/array_api.array.__dlpack__.html
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]