[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

GitBox Fri, 19 Aug 2022 08:10:32 -0700


jverma-quic commented on code in PR #12340:
URL: https://github.com/apache/tvm/pull/12340#discussion_r950295180



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor 
used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to 
"int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the 
nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as 
follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized 
floating-point
+      values or 0 for the denormalized values, and the fraction part. This 
ensures that
+      mantissa is always within [0, 2) range. Please note that this function 
doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations 
below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent 
both
+    positive and negative values, a bias (127 for single precision) is added 
to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted 
from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we 
multiply the
+    value with a scaling factor and then round it to the nearest integer. The 
scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be 
safely multiplied
+    to the floating-point value, without causing the resulting value to 
overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point 
value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 
2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the 
fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa 
gets too close to 2,
+    this will cause the resulting value to go out of range and require it to 
be saturated.
+    In the following implementation, we perform range check and adjust the 
scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for 
single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the 
floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes 
assertion failure
+    2) NaN and INF: assertion failure
+    """
+
+    def within_range(val, dtype):
+        if dtype == "int16":
+            return -32768 <= val <= 32767
+        raise RuntimeError(f"Unsupported dtype, {dtype}'")
+
+    # Make sure that 'flp' isn't NaN or infinity
+    if math.isnan(flp) or math.isinf(flp):
+        raise RuntimeError("Can not handle NaN or INF")
+
+    flp_f = struct.pack("f", flp)
+    flp_i = struct.unpack("I", flp_f)
+    exp_stored_value = (flp_i[0] >> 23) & 0xFF
+
+    if exp_stored_value == 0:
+        raise RuntimeError("Can not handle denormalized values")

Review Comment:
   Sure, I'll elaborate on this. Thanks!



##########
python/tvm/topi/hexagon/utils.py:
##########
@@ -150,4 +157,126 @@ def get_layout_transform_fn(layout):
         return nc_2048_2d
     if layout == "nhwc-8h8w32c-2d":
         return nhwc_8h8w32c_2d
+    if layout == "n11c-2048c-2d":
+        return n11c_2048c_2d
     raise RuntimeError(f"Unexpected layout '{layout}'")
+
+
+def get_fixed_point_value(flp: float, dtype: str = "int16"):
+    """
+    Return fixed-point value and the corresponding log2 of the scale factor 
used to compute
+    this value.
+
+    Parameters
+    ----------
+    flp : float
+        Floating-point value to be converted
+    dtype : str
+        Type of the resulting fixed-point value. By default, it's set to 
"int16"
+
+    Returns
+    -------
+    fixed_point_value : int
+        Fixed-point value for the given floating-point value
+    exp_scale_factor : int
+        log2 of the scale factor
+
+    Convert floating-point value into fixed-point number. This is done by
+    multiplying the value by a scaling factor and then rounding it to the 
nearest
+    integer value.
+
+    As per IEEE-754 standard, a floating-point value can be represented as 
follows
+    [see: https://en.wikipedia.org/wiki/IEEE_754-1985]:
+        (-1)^S * M * 2^(E-Bias)
+
+    Here,
+    * S is the signed bit (0 or 1).
+    * M is the mantissa. It's composed of an implicit 1 for the normalized 
floating-point
+      values or 0 for the denormalized values, and the fraction part. This 
ensures that
+      mantissa is always within [0, 2) range. Please note that this function 
doesn't
+      handle denormalized values.
+    * E is the exponent.
+
+    In single precision, 23 bits are used to represent the fraction part of
+    the mantissa (and therefore, '23' shows up in one of the computations 
below) and
+    8 bits are used for the exponent. Since exponent field needs to reperesent 
both
+    positive and negative values, a bias (127 for single precision) is added 
to the actual
+    value. Therefore, to compute the actual exponent, 127 must be subtracted 
from the stored
+    value.
+
+    As mentioned above, to find the corresponding fixed-point number, we 
multiply the
+    value with a scaling factor and then round it to the nearest integer. The 
scaling factor
+    is chosen to be a power for 2 and it's the largest value that can be 
safely multiplied
+    to the floating-point value, without causing the resulting value to 
overflow the range
+    of the integer type used to represent the fixed-point value.
+
+    So, if we assume the scaling factor to be 2^x, the resulting fixed-point 
value will be:
+        round((-1)^S * (M) * 2^(E-Bias) * 2^x)
+
+    This can be simplified to:
+        round((-1)^S * M * 2^(E-Bias+x)
+
+    Now, if 'int16' is used for fixed-point value, then it has to be >= -(2 * 
2^14)
+    and <= (2 * 2^14) - 1. Since M (Mantissa) is always < 2, in order for the 
fixed-point value
+    to be within this range, 2^(E - Bias + x) must be <= 2^14 - 1.
+    And, if we ignore -1, (E - Bias + x) should be <= 14. Note: if mantissa 
gets too close to 2,
+    this will cause the resulting value to go out of range and require it to 
be saturated.
+    In the following implementation, we perform range check and adjust the 
scale to avoid
+    saturation.
+    For most cases, 2^x, where x = 14 - (E - Bias) or 14 - (E - 127) for 
single precision, is the
+    best scaling factor for 'int16' type that can be used to convert the 
floating-point value to
+    fixed-point with the least amount of precision loss.
+
+    Additonal notes on various floating-point values:
+    ------------------------------------------------
+    1) Denormalized values: Can't be represented as fixed-point - causes 
assertion failure

Review Comment:
   To convert denormalized values into fixed point values, we'll require a very 
large scale factor which can't be represented using the available bits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] jverma-quic commented on a diff in pull request #12340: [TOPI][Hexagon] Implement quantized avgpool

Reply via email to