masahi edited a comment on pull request #9164: URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623
Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on `main`, before partitioning. For example, that was the case in PyTorch frontend before #9135 which always produced something like `qnn.quantize(const_weight_fp32)`. The other case is QNN produced by [FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc) pass, which also generates many `qnn.quantize` with constant weights. In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I need to retain QNN ops all the way, so running legalization has never been my option. Maybe I'm missing something. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
