Thanks @kindlehe @masahi
Masa explained it correctly. For a long time, processors had higher FP32 throughput than Int8 throughput. So, it is not fair to assume that quantization will give you performance benefits on all the machines. Check Intel VNNI, Nvidia DP4A and tensor cores, and ARM DOT instructions. These are the hardware vendor efforts to make int8 throughput better than FP32 throughput. I do not see tutorial to be very different from FP32 compilation. But, I do see value in just writing one to get people started. I can take care of that. @masahi Do you want to first have a PyTorch FP32 tutorial and then maybe we can build on top of that as part 2? --- [Visit Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/4) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/02c01029d670f27079c2959d8070fd6898db18299144ce77d79d66dfbcd8cc26).