Yes, without HW support for int8, you shouldn't expect int8 to be any faster than fp32. For avx2, Torch is much faster than TVM for int8. For avx512, where int8 does make a difference, TVM is much faster.
I have a script https://github.com/Edgecortix-Inc/pytorch_quantization/tree/master/tvm_qnn_evaluation which can also be used for perf benchmark. Set this to True https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tvm_qnn_evaluation/imagenet_test.py#L82 and pick your target here https://github.com/Edgecortix-Inc/pytorch_quantization/blob/master/tvm_qnn_evaluation/test_util.py#L63 * For skylake with avx512 support, the target should be "llvm -mcpu=skylake-avx512 * For cascadelake, "llvm -mcpu=cascadelake" Maybe @anijain2305 can give more comments. --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/500eadc82da727ca72d6066bf857afbf376ebda416a12a3196d5f844fd62fc35).