masahi edited a comment on pull request #9164:
URL: https://github.com/apache/tvm/pull/9164#issuecomment-937631623


   Hmm interesting, I never thought about doing constant folding on partitioned 
functions. My use cases have always been doing constant folding on `main`, 
before partitioning. For example, that was the case in PyTorch frontend before 
#9135 which always produced something like `qnn.quantize(const_weight_fp32)`. 
The other case is QNN produced by 
[FakeQuantizationToInteger](https://github.com/apache/tvm/blob/4ffbdcd0aaed4f382f06c6a9e2b2d048b6abdaa9/src/relay/transforms/fake_quantization_to_integer.cc)
 pass, which also generates many `qnn.quantize` with constant weights.
   
   In 2), if we run legalization on partitioned functions, wouldn't that 
decompose all QNN ops? I couldn't easily extract qparams anymore, for example. 
I needed to retain QNN ops all the way until I translated them to the external 
IR, so running legalization has never been my option. Maybe I'm missing 
something.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to