This is an automated email from the ASF dual-hosted git repository.
junrushao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm-ffi.git
The following commit(s) were added to refs/heads/main by this push:
new 395db3c Add Support for NVIDIA Ampere GPUs in _get_cuda_target (#440)
395db3c is described below
commit 395db3cef62f430831eb9e927357334ed3fdfade
Author: Yuhong Guo <[email protected]>
AuthorDate: Sat Feb 14 04:54:45 2026 +0800
Add Support for NVIDIA Ampere GPUs in _get_cuda_target (#440)
I'm using SGLang, which relies on TVM-FFI, on a machine equipped with an
NVIDIA A10 GPU. We encountered the following error:
<img width="2864" height="2180" alt="image"
src="https://github.com/user-attachments/assets/d014ed35-7940-44b1-bf36-6950a9d6d14f"
/>
This issue is commonly observed on NVIDIA Ampere-generation GPUs (e.g.,
A10, A100) — see related discussion:
https://github.com/sgl-project/sglang/issues/18108 ,
https://github.com/sgl-project/sglang/pull/18496 ,
https://github.com/apache/tvm-ffi/issues/430.
The root cause is that older NVIDIA drivers (commonly deployed on Ampere
systems) do not support the compute_cap query field in nvidia-smi. As a
result, _get_cuda_target fails when trying to auto-detect the CUDA
compute capability.
<img width="862" height="66" alt="image"
src="https://github.com/user-attachments/assets/66af252d-baeb-48f7-af1a-539e27d62899"
/>
To address this, we fall back to querying the GPU name via:
```bash
nvidia-smi --query-gpu=name --format=csv,noheader
```
<img width="728" height="96" alt="image"
src="https://github.com/user-attachments/assets/102c58f4-69ba-4649-ad06-17faaf686699"
/>
and then map known Ampere GPU names (e.g., "NVIDIA A10") to their
corresponding compute capabilities (e.g., 8.6).
This change enables robust GPU detection on Ampere devices with legacy
drivers while maintaining backward compatibility.
---------
Co-authored-by: gemini-code-assist[bot]
<176961590+gemini-code-assist[bot]@users.noreply.github.com>
---
python/tvm_ffi/cpp/extension.py | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/python/tvm_ffi/cpp/extension.py b/python/tvm_ffi/cpp/extension.py
index e988a77..b558fb0 100644
--- a/python/tvm_ffi/cpp/extension.py
+++ b/python/tvm_ffi/cpp/extension.py
@@ -154,8 +154,27 @@ def _get_cuda_target() -> str:
major, minor = compute_cap.split(".")
return
f"-gencode=arch=compute_{major}{minor},code=sm_{major}{minor}"
except Exception:
- # fallback to a reasonable default
- return "-gencode=arch=compute_70,code=sm_70"
+ try:
+ # For old drivers, there is no compute_cap, but we can use the
GPU name to determine the architecture.
+ ampere_arch_map = {
+ "A100": ("8", "0"),
+ "A10": ("8", "6"),
+ }
+ status = subprocess.run(
+ args=["nvidia-smi", "--query-gpu=name",
"--format=csv,noheader"],
+ capture_output=True,
+ check=True,
+ text=True,
+ )
+ gpu_name = status.stdout.strip().split("\n")[0]
+ for gpu_key, (major, minor) in ampere_arch_map.items():
+ if gpu_key in gpu_name:
+ return
f"-gencode=arch=compute_{major}{minor},code=sm_{major}{minor}"
+ except (subprocess.CalledProcessError, FileNotFoundError):
+ pass
+ raise RuntimeError(
+ "Could not detect CUDA compute_cap automatically. Please set
TVM_FFI_CUDA_ARCH_LIST environment variable."
+ )
def _run_command_in_dev_prompt(