Thanks @kindlehe @masahi

Masa explained it correctly. For a long time, processors had higher FP32 
throughput than Int8 throughput.    So, it is not fair to assume that 
quantization will give you performance benefits on all the machines. Check 
Intel VNNI, Nvidia DP4A and tensor cores, and ARM DOT instructions. These are 
the hardware vendor efforts to make int8 throughput better than FP32 throughput.

I do not see tutorial to be very different from FP32 compilation. But, I do see 
value in just writing one to get people started. I can take care of that. 
@masahi Do you want to first have a PyTorch FP32 tutorial and then maybe we can 
build on top of that as part 2?





---
[Visit 
Topic](https://discuss.tvm.ai/t/is-there-any-speed-comparison-of-quantization-on-cpu/6256/4)
 to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/02c01029d670f27079c2959d8070fd6898db18299144ce77d79d66dfbcd8cc26).

Reply via email to