quic-sanirudh commented on PR #15772:
URL: https://github.com/apache/tvm/pull/15772#issuecomment-1730910882

   > I think it's worth discussing if we need a full-blown QNN "dialect" in 
Relax, like Relay.
   > 
   > If we want to have other qnn ops like conv2d, dense etc in the future, 
having a dialect, with a separate "canonicalization" step, is probably 
necessary. But the industry is moving toward "QDQ representation" for 
quantization model representation, which only requires quantize and dequantize 
ops.
   > 
   > If Relax also adopts QDQ for quantized representation, we don't need a 
large dialect like Relay does. We can define quantize / dequantize as a normal 
relax ops, and the logic in the canonicalize pass can go directly to the 
legalize pass.
   > 
   > cc @tqchen @MasterJH5574 @Hzfengsy this PR is motivated by activation 
quantization (smooth quant, in particular).
   
   I agree that it might be good to discuss the design of how to support 
quantization in relax before introducing this.
   
   One approach we've been thinking about is to perhaps introduced a new 
`QuantTensor` type that might represent quantized tensor and includes the 
`zero_point` and `scale` params as attributes. This way, we would be able to 
maintain quantization params as part of the tensor it applies to, and maybe 
avoid introducing new quantized ops (just reuse the existing ops with 
`QuantTensor` arguments to represent quantized op). This could eventually be 
lowered (canonicalized) to a regular op later in the flow if needed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to