masahi commented on code in PR #12499:
URL: https://github.com/apache/tvm/pull/12499#discussion_r1008452745
##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -235,6 +241,19 @@ def get_fixed_point_value(flp: float, dtype: str =
"int16") -> Tuple[int, int]:
best scaling factor for 'int16' type that can be used to convert the
floating-point value to
fixed-point with the least amount of precision loss.
+
+ Here is a more rigorous explanation of the above, for non-negative scale
values, which are of
+ interest. M < 2, so M * 2^(E-Bias+x) < 2 ^ (E-Bias+x+1) [Note: LHS is a
fraction, RHS int]
+ => round(M * 2^(E-Bias+x)) <= 2 ^ (E-Bias+x+1) [Note the "<=", not "<"]
+ We want x s.t. round(M * 2^(E-Bias+x)) <= 2^15 - 1
+ We know round(M * 2^(E-Bias+x)) <= 2^(E-Bias+x+1)
+ It will be sufficient to choose x s.t. 2^(E-Bias+x+1) <= 2^15 - 1
+ That is, max x. s.t. 2^(E-Bias+x+1) < 2^15
+ E-Bias+x+1 < 15
+ E-Bias+x+1 <= 14
+ Max x will make E-Bias+x+1 = 14
+ x = 13 - E + Bias
Review Comment:
cc @ibsidorenko - I'm curious how requantize operation done in QC "slice
ops" (such as this PR) compare to the one done by QNN canonicalization.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]