cgerum opened a new issue, #11388:
URL: https://github.com/apache/tvm/issues/11388

   ### Expected behavior
   
   TVM should compile int8 conv2d and dense kernels, when `sm_**` is set to the 
correct version. 
   
   ### Actual behavior
   
   Cuda compilation fails with: 
   
      Compilation error:
       /tmp/tmp_paip2b7/my_kernel.cu(150): error: identifier "__dp4a" is 
undefined
   
   ### Environment
   
   TVM Version: current main
   Target System: jetsonnano 4.9.253-tegra #1 SMP PREEMPT Mon Jul 26 12:13:06 
PDT 2021 aarch64 aarch64 aarch64 GNU/Linux
   CUDA Version: 10.2
   
   ### Steps to reproduce
   
   This script reproduces the error without need for an actual target hardware:
   
   ```python
   import tflite
   import tvm 
   from tvm import relay
   
   target = tvm.target.Target("nvidia/jetson-nano")
   breakpoint()
   model_path = "pretrainedResnet_quant.tflite"
   modelBuf = open(model_path, "rb").read()
   tflModel = tflite.Model.GetRootAsModel(modelBuf, 0)
   
   mod, params = relay.frontend.from_tflite(tflModel)
   
   lib = relay.build_module.build(mod, params=params, target=target)
   ```
   The model file is available from: 
https://github.com/mlcommons/tiny/raw/master/benchmark/training/image_classification/trained_models/pretrainedResnet_quant.tflite
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to