anijain2305 commented on a change in pull request #4798: [QNN] Optimize
lowering for requantize and FixedPointMultiply.
URL: https://github.com/apache/incubator-tvm/pull/4798#discussion_r374425432
##########
File path: src/relay/qnn/op/requantize.cc
##########
@@ -103,7 +103,11 @@ Expr RequantizeLower(const Expr& input_tensor, const
Expr& input_scale,
shifted_int64_t = Add(Cast(output_zero_point, hp_dtype), scaled_int64_t);
}
- // 4) Clip to the out_dtype min/max.
+ // 4) Clip to the out_dtype min/max. Skip clipping if out_dtype is Int32.
The fixed point
+ // multiplication keeps the value in int32 range.
+ if (out_dtype == DataType::Int(32)) {
+ return Cast(shifted_int64_t, out_dtype);
+ }
Review comment:
Definitely, happy to explain :)
We approximate the floating point computation here with fixed point
computation. This is done by representing the requantize_scale
(input_scale/output_scale) as int32, where the decimal point is between 1st and
2nd bit - representing a number between 0.5 and 1. And then we multiply this
fixed point number with the quantized tensor (another int32 tensor). So, to
keep the precision higher, we perform multiplication in int64. But, we can
safely say that the resulting number is still a fixed point int64 number, where
the decimal part of the number is within int32 range. We, then perform, right
shift etc to get the decimal portion.
So, if the requantize scale is less than 1, we can safely assume that the
range will be within int32. (I forgot to add that check, but let me add that as
a second commit).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services