This is an automated email from the ASF dual-hosted git repository.
mshr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git
The following commit(s) were added to refs/heads/main by this push:
new c71aefc745 [Docs] Fix e2e_opt_model tutorial for GPU deployment
(#18539)
c71aefc745 is described below
commit c71aefc745e8ab3bb1ee5426a99154a81c30cc4e
Author: Shushi Hong <[email protected]>
AuthorDate: Thu Dec 4 05:18:09 2025 -0500
[Docs] Fix e2e_opt_model tutorial for GPU deployment (#18539)
This PR is to resolve the issue #18481 , which fixes two bugs in the
end-to-end optimization tutorial
(`docs/how_to/tutorials/e2e_opt_model.py`) that prevented it from
running correctly on GPU devices.
### Changes
1. **Added DefaultGPUSchedule transformation**
- Apply `DefaultGPUSchedule` to ensure all GPU functions have proper
thread binding. This fixes the memory verification error: "`Variable is
directly accessed by host memory... Did you forget to bind?`"
2. **Fixed VM output handling**
- Updated to correctly extract tensor from VM output.
---
docs/how_to/tutorials/e2e_opt_model.py | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/docs/how_to/tutorials/e2e_opt_model.py
b/docs/how_to/tutorials/e2e_opt_model.py
index 9f89e744a3..8307ddc4f2 100644
--- a/docs/how_to/tutorials/e2e_opt_model.py
+++ b/docs/how_to/tutorials/e2e_opt_model.py
@@ -113,12 +113,14 @@ if not IS_IN_CI:
# We skip this step in the CI environment.
if not IS_IN_CI:
- ex = tvm.compile(mod, target="cuda")
+ with target:
+ mod = tvm.tir.transform.DefaultGPUSchedule()(mod)
+ ex = tvm.compile(mod, target=target)
dev = tvm.device("cuda", 0)
vm = relax.VirtualMachine(ex, dev)
# Need to allocate data and params on GPU device
gpu_data = tvm.runtime.tensor(np.random.rand(1, 3, 224,
224).astype("float32"), dev)
gpu_params = [tvm.runtime.tensor(p, dev) for p in params["main"]]
- gpu_out = vm["main"](gpu_data, *gpu_params).numpy()
+ gpu_out = vm["main"](gpu_data, *gpu_params)[0].numpy()
print(gpu_out.shape)