masahi edited a comment on pull request #9164: URL: https://github.com/apache/tvm/pull/9164#issuecomment-937591028
@apeskov Please see this PR https://github.com/apache/tvm/pull/9135. I understand why you want to do this, namely, constant fold `quantize(weight_fp32)` in a QNN graph. Returning float32 weights from the PyTorch frontend and relying on Relay constant folding to recover quantized weights was my design mistake. Now you can directly obtain quantized weights from the frontend (we do quantize at numpy level). @manupa-arm Running lowering before const fold is not acceptable when we want to keep the rest of QNN graphs (BYOC), while selectively lower constant subgraphs and evaluate them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
