Lunderberg commented on PR #16689: URL: https://github.com/apache/tvm/pull/16689#issuecomment-1995847276
> This PR also changes the `GPUCopy` of CUDA device API to always using `cudaMemcpyAsync`. I think this portion of the commit needs to be reverted. Prior to this commit, the [`NDArray::CopyTo`](https://github.com/apache/tvm/blob/main/include/tvm/runtime/ndarray.h#L116) function could be called to transfer an array to/from the GPU and return the transferred array. After this commit, there is no synchronization point after the `cudaMemcpyAsync`, before returning control to the caller of `NDArray::CopyTo`. * The caller may read from the `NDArray` result immediately after it completes. After this commit, this is a read from uninitialized memory. * The caller may free the backing allocation of the `NDArray` argument immediate after `NDArray::CopyTo` completes. After this commit, this causes CUDA to read from a dangling pointer. This function is used in many locations which relied on the previous semantics. * The [`""vm.builtin.to_device"`](https://github.com/apache/tvm/blob/main/src/runtime/relax_vm/builtin.cc#L418) PackedFunc, which is the lowered form of `R.to_device`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
