This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git


The following commit(s) were added to refs/heads/main by this push:
     new 803cdc8  [DOCS] Update kernel library guide with device guard (#289)
803cdc8 is described below

commit 803cdc84a4bb4502c8da0e5f69a61c1e7b1a38cf
Author: Yaxing Cai <[email protected]>
AuthorDate: Wed Nov 26 16:04:00 2025 -0800

    [DOCS] Update kernel library guide with device guard (#289)
    
    Update kernel library guide with device guard
---
 docs/guides/kernel_library_guide.rst | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/docs/guides/kernel_library_guide.rst 
b/docs/guides/kernel_library_guide.rst
index 8966e30..82b1bff 100644
--- a/docs/guides/kernel_library_guide.rst
+++ b/docs/guides/kernel_library_guide.rst
@@ -145,6 +145,21 @@ Explicit Update
 
 Once the devices on which the stream contexts reside cannot be inferred from 
the tensors, the explicit update on stream context table is necessary. TVM FFI 
provides :py:func:`tvm_ffi.use_torch_stream` and 
:py:func:`tvm_ffi.use_raw_stream` for manual stream context update. However, it 
is **recommended** to use implicit update above, to reduce code complexity.
 
+Device Guard
+============
+
+When launching kernels, kernel libraries may require the current device 
context to be set for a specific device. TVM FFI provides the 
:cpp:class:`tvm::ffi::CUDADeviceGuard` class to manage this, similar to 
:cpp:class:`c10::cuda::CUDAGuard`. When a 
:cpp:class:`tvm::ffi::CUDADeviceGuard` object is constructed with a device 
index, it saves the original device index (retrieved using ``cudaGetDevice``) 
and sets the current device to the given index (using ``cudaSetDevice``). Upon 
destruction  [...]
+
+.. code-block:: c++
+
+ void func(ffi::TensorView input, ...) {
+   // current device index is original device index
+   ffi::CUDADeviceGuard device_guard(input.device().device_id);
+   // current device index is input device index
+ }
+
+After ``func`` returns, the ``device_guard`` is destructed, and the original 
device index is restored.
+
 Function Exporting
 ==================
 

Reply via email to