quic-sanirudh commented on PR #15772: URL: https://github.com/apache/tvm/pull/15772#issuecomment-1730910882
> I think it's worth discussing if we need a full-blown QNN "dialect" in Relax, like Relay. > > If we want to have other qnn ops like conv2d, dense etc in the future, having a dialect, with a separate "canonicalization" step, is probably necessary. But the industry is moving toward "QDQ representation" for quantization model representation, which only requires quantize and dequantize ops. > > If Relax also adopts QDQ for quantized representation, we don't need a large dialect like Relay does. We can define quantize / dequantize as a normal relax ops, and the logic in the canonicalize pass can go directly to the legalize pass. > > cc @tqchen @MasterJH5574 @Hzfengsy this PR is motivated by activation quantization (smooth quant, in particular). I agree that it might be good to discuss the design of how to support quantization in relax before introducing this. One approach we've been thinking about is to perhaps introduced a new `QuantTensor` type that might represent quantized tensor and includes the `zero_point` and `scale` params as attributes. This way, we would be able to maintain quantization params as part of the tensor it applies to, and maybe avoid introducing new quantized ops (just reuse the existing ops with `QuantTensor` arguments to represent quantized op). This could eventually be lowered (canonicalized) to a regular op later in the flow if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
