The GitHub Actions job "Nightly Docker Update" on tvm.git/main has failed. Run started by GitHub user areusch (triggered by areusch).
Head commit for run: fa905d2b693e5368ea059a72f3fd1333005f6560 / Kathryn (Jinqi) Chen <[email protected]> [Compile] accelerate compilation speed using NVRTC (#18519) This PR supports NVRTC as an alternative to NVCC for faster, device-side JIT compilation of CUDA kernels, in favor of the PR [https://github.com/apache/tvm-ffi/pull/283](https://github.com/apache/tvm-ffi/pull/283). It enhances the CUDA compilation backend by: - Adding Python NVRTC support using cuda-python bindings - Removing legacy C++ NVRTC fallback in favor of a Python-first approach - Keeping nvcc as the default compiler with fatbin output (no behavior change for existing users) Users can choose the compilation backend using an environment variable `TVM_CUDA_COMPILE_MODE`, choosing from "nvcc" and "nvrtc". For example, `TVM_CUDA_COMPILE_MODE=nvrtc python3 your_program.py` Here is a short benchmark of the compilation speed of kernels in `test_target_codegen_cuda.py`. ### NVCC vs NVRTC Compilation Time Comparison (Python-side Call) | Test Case | Code Size | NVCC Time (ms) | NVRTC Time (ms) | Speedup | | :--- | :--- | :--- | :--- | :--- | | `test_crossthread_reduction1` | 1945 B | 241.27 | 51.23 | **4.7x** | | `test_cuda_bf16_vectorize_add` | 3760 B | 342.72 | 44.50 | **7.7x** | | `test_cuda_const_float_to_half` | 12394 B | 272.85 | 31.99 | **8.5x** | | `test_cuda_device_func_call` | 975 B | 215.58 | 21.47 | **10.0x** | | `test_cuda_float_const_hex_format` | 685 B | 217.39 | 20.52 | **10.6x** | | `test_cuda_floordiv_with_vectorization` | 1050 B | 213.88 | 23.32 | **9.2x** | | `test_cuda_inf_nan` | 673 B | 214.33 | 24.94 | **8.6x** | | `test_cuda_tensormap` | 755 B | 213.91 | 20.74 | **10.3x** | | `test_cuda_thread_sync_inside_condition` | 1007 B | 213.43 | 28.29 | **7.5x** | | `test_cuda_vectorize_add` | 908 B | 226.81 | 40.39 | **5.6x** | | `test_cuda_vectorize_load` | 734 B | 217.25 | 24.02 | **9.0x** | | `test_device_host_call_same_func` | 924 B | 216.03 | 21.21 | **10.2x** | | `test_vectorized_intrin1` | 847 B | 226.15 | 26.34 | **8.6x** | ### NVSHMEM Support Currently, NVSHMEM is **not** supported via NVRTC. - Fallback Behavior: When NVSHMEM is required, the compilation pipeline will automatically fall back to NVCC, even if `TVM_CUDA_COMPILE_MODE` is set to nvrtc. - Future Roadmap: Support for NVRTC with NVSHMEM is planned for follow-up PRs. Report URL: https://github.com/apache/tvm/actions/runs/20836125782 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
