jwfromm opened a new pull request #7613:
URL: https://github.com/apache/tvm/pull/7613


   This PR adds `simulated_quantize` and `simulated_dequantize` to the QNN 
library in relay. These operators are primarily meant to support the pass-based 
quantization framework proposed [in this Discuss 
post](https://discuss.tvm.apache.org/t/rfc-quantization-quantization-in-tvm/9161).
 However, these new ops can be cleanly broken into their own PR and can be 
useful for other applications. The obvious benefit of simulated qnn ops is that 
they mimic real quantization in floating point. The more interesting benefit of 
this approach is that it allows switching between per-channel and scalar QNN 
parameters and changing the datatype **without recompilation**. This has major 
compute time benefits when doing calibration or eventually trying to do 
quantization aware training.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to