sugartom opened a new issue #9269:
URL: https://github.com/apache/tvm/issues/9269


   ### Expected behavior
   
   I installed the latest TVM from source, and managed to compile Resnet50 
following your 
[tutorial](https://tvm.apache.org/docs/tutorial/tvmc_command_line_driver.html). 
By setting the target flag as "llvm", I did observe a speedup from 
resnet50-v2-7-tvm.tar to resnet50-v2-7_autotuned.tar.
   
   However, when setting the target flag as "cuda", the autotuned version is 
slower than the non-autotuned version on GPU. And I would like to ask whether 
anyone has observed similar behavior before, or is there anything I did in a 
wrong way.
   
   ### Environment
   
   Ubuntu 20.04 with 3080 Ti
   CUDA 11.2 with driver 460.91.03
   TVM version: 0.8.dev0
   LLVM version: 13.0.0
   
   ### Steps to reproduce
   
   Steps to generate and test the non-autotuned version:
   ```
   tvmc compile --target "cuda" --output resnet50-v2-7-tvm-cuda.tar 
resnet50-v2-7.onnx
   tvmc run --device cuda --inputs imagenet_cat.npz --output predictions.npz 
--print-time --repeat 100 resnet50-v2-7-tvm-cuda.tar
   ```
   My terminal output for non-autotuned version:
   ```
   Execution time summary:
    mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
      3.4426       3.4623       5.6511       3.0480       0.4118
   ```
   
   Steps to generate and test the autotuned version:
   ```
   tvmc tune --target "cuda" --output resnet50-v2-7-autotuner_records-cuda.json 
resnet50-v2-7.onnx
   tvmc compile --target "cuda" --tuning-records 
resnet50-v2-7-autotuner_records-cuda.json --output 
resnet50-v2-7-tvm_autotuned-cuda.tar resnet50-v2-7.onnx
   tvmc run --device cuda --inputs imagenet_cat.npz --output predictions.npz 
--print-time --repeat 100 resnet50-v2-7-tvm_autotuned-cuda.tar
   ```
   My terminal output for autotuned version:
   ```
   Execution time summary:
    mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
      4.8350       5.0163       7.8554       4.4040       0.5398
   ```
   
   From the above, we can find that the autotuned one takes longer time than 
the non-autotuned one.
   I am new to TVM. So my "bug" might be naive. Any help will be greatly 
appreciated, and thanks in advance! :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to