cgerum opened a new issue, #11388:
URL: https://github.com/apache/tvm/issues/11388
### Expected behavior
TVM should compile int8 conv2d and dense kernels, when `sm_**` is set to the
correct version.
### Actual behavior
Cuda compilation fails with:
Compilation error:
/tmp/tmp_paip2b7/my_kernel.cu(150): error: identifier "__dp4a" is
undefined
### Environment
TVM Version: current main
Target System: jetsonnano 4.9.253-tegra #1 SMP PREEMPT Mon Jul 26 12:13:06
PDT 2021 aarch64 aarch64 aarch64 GNU/Linux
CUDA Version: 10.2
### Steps to reproduce
This script reproduces the error without need for an actual target hardware:
```python
import tflite
import tvm
from tvm import relay
target = tvm.target.Target("nvidia/jetson-nano")
breakpoint()
model_path = "pretrainedResnet_quant.tflite"
modelBuf = open(model_path, "rb").read()
tflModel = tflite.Model.GetRootAsModel(modelBuf, 0)
mod, params = relay.frontend.from_tflite(tflModel)
lib = relay.build_module.build(mod, params=params, target=target)
```
The model file is available from:
https://github.com/mlcommons/tiny/raw/master/benchmark/training/image_classification/trained_models/pretrainedResnet_quant.tflite
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]