jwfromm opened a new pull request #7613: URL: https://github.com/apache/tvm/pull/7613
This PR adds `simulated_quantize` and `simulated_dequantize` to the QNN library in relay. These operators are primarily meant to support the pass-based quantization framework proposed [in this Discuss post](https://discuss.tvm.apache.org/t/rfc-quantization-quantization-in-tvm/9161). However, these new ops can be cleanly broken into their own PR and can be useful for other applications. The obvious benefit of simulated qnn ops is that they mimic real quantization in floating point. The more interesting benefit of this approach is that it allows switching between per-channel and scalar QNN parameters and changing the datatype **without recompilation**. This has major compute time benefits when doing calibration or eventually trying to do quantization aware training. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
