This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git
The following commit(s) were added to refs/heads/main by this push:
new 803cdc8 [DOCS] Update kernel library guide with device guard (#289)
803cdc8 is described below
commit 803cdc84a4bb4502c8da0e5f69a61c1e7b1a38cf
Author: Yaxing Cai <[email protected]>
AuthorDate: Wed Nov 26 16:04:00 2025 -0800
[DOCS] Update kernel library guide with device guard (#289)
Update kernel library guide with device guard
---
docs/guides/kernel_library_guide.rst | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/docs/guides/kernel_library_guide.rst
b/docs/guides/kernel_library_guide.rst
index 8966e30..82b1bff 100644
--- a/docs/guides/kernel_library_guide.rst
+++ b/docs/guides/kernel_library_guide.rst
@@ -145,6 +145,21 @@ Explicit Update
Once the devices on which the stream contexts reside cannot be inferred from
the tensors, the explicit update on stream context table is necessary. TVM FFI
provides :py:func:`tvm_ffi.use_torch_stream` and
:py:func:`tvm_ffi.use_raw_stream` for manual stream context update. However, it
is **recommended** to use implicit update above, to reduce code complexity.
+Device Guard
+============
+
+When launching kernels, kernel libraries may require the current device
context to be set for a specific device. TVM FFI provides the
:cpp:class:`tvm::ffi::CUDADeviceGuard` class to manage this, similar to
:cpp:class:`c10::cuda::CUDAGuard`. When a
:cpp:class:`tvm::ffi::CUDADeviceGuard` object is constructed with a device
index, it saves the original device index (retrieved using ``cudaGetDevice``)
and sets the current device to the given index (using ``cudaSetDevice``). Upon
destruction [...]
+
+.. code-block:: c++
+
+ void func(ffi::TensorView input, ...) {
+ // current device index is original device index
+ ffi::CUDADeviceGuard device_guard(input.device().device_id);
+ // current device index is input device index
+ }
+
+After ``func`` returns, the ``device_guard`` is destructed, and the original
device index is restored.
+
Function Exporting
==================