This is an automated email from the ASF dual-hosted git repository.

mshr pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git


The following commit(s) were added to refs/heads/main by this push:
     new c71aefc745 [Docs] Fix e2e_opt_model tutorial for GPU deployment 
(#18539)
c71aefc745 is described below

commit c71aefc745e8ab3bb1ee5426a99154a81c30cc4e
Author: Shushi Hong <[email protected]>
AuthorDate: Thu Dec 4 05:18:09 2025 -0500

    [Docs] Fix e2e_opt_model tutorial for GPU deployment (#18539)
    
    This PR is to resolve the issue #18481 , which fixes two bugs in the
    end-to-end optimization tutorial
    (`docs/how_to/tutorials/e2e_opt_model.py`) that prevented it from
    running correctly on GPU devices.
    
    ### Changes
    
    1. **Added DefaultGPUSchedule transformation**
    - Apply `DefaultGPUSchedule` to ensure all GPU functions have proper
    thread binding. This fixes the memory verification error: "`Variable is
    directly accessed by host memory... Did you forget to bind?`"
    
    
    2. **Fixed VM output handling**
       - Updated to correctly extract tensor from VM output.
---
 docs/how_to/tutorials/e2e_opt_model.py | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/how_to/tutorials/e2e_opt_model.py 
b/docs/how_to/tutorials/e2e_opt_model.py
index 9f89e744a3..8307ddc4f2 100644
--- a/docs/how_to/tutorials/e2e_opt_model.py
+++ b/docs/how_to/tutorials/e2e_opt_model.py
@@ -113,12 +113,14 @@ if not IS_IN_CI:
 # We skip this step in the CI environment.
 
 if not IS_IN_CI:
-    ex = tvm.compile(mod, target="cuda")
+    with target:
+        mod = tvm.tir.transform.DefaultGPUSchedule()(mod)
+    ex = tvm.compile(mod, target=target)
     dev = tvm.device("cuda", 0)
     vm = relax.VirtualMachine(ex, dev)
     # Need to allocate data and params on GPU device
     gpu_data = tvm.runtime.tensor(np.random.rand(1, 3, 224, 
224).astype("float32"), dev)
     gpu_params = [tvm.runtime.tensor(p, dev) for p in params["main"]]
-    gpu_out = vm["main"](gpu_data, *gpu_params).numpy()
+    gpu_out = vm["main"](gpu_data, *gpu_params)[0].numpy()
 
     print(gpu_out.shape)

Reply via email to