mbrookhart opened a new pull request #8126: URL: https://github.com/apache/tvm/pull/8126
Recently, we discovered that tf2onnx is exporting some int8 graphs as fake quantized/QAT models in ONNX, i.e, int8 ops are exported as dequantize->op->quantize. This PR introduces a pass to convert those graphs into direct int8 ops inside relay. I've tested correctness of the resulting models on Inceptionv1 and ssd-mobilenet-v1 from the tensorflow lite model zoo imported via ONNX. Follow up work will analyze further models for more operations to include in this pass. cc @AndrewZhaoLuo @masahi @jwfromm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
