[GitHub] [tvm] guberti commented on a diff in pull request #13242: [microTVM] [WIP] Modernize Arm Cortex-M convolution schedules

GitBox Sun, 20 Nov 2022 22:40:52 -0800


guberti commented on code in PR #13242:
URL: https://github.com/apache/tvm/pull/13242#discussion_r1027626205



##########
python/tvm/topi/arm_cpu/mprofile/dsp/micro_kernel/tensordot.py:
##########
@@ -15,141 +15,294 @@
 # specific language governing permissions and limitations
 # under the License.
 """Computes a "jumpy tensordot" operator, which can be used to tensorize many 
common operators
-including regular conv2d, depthwise conv2d, and grouped conv2d provided the 
data and kernel layouts
-are the optimal ones. When groups=1, the optimal data layout is NHWC and 
kernel layout is OHWI. When
-this is a depthwise convolution, the optimal data layout is NCHW and kernel 
layout is OIHW."""
+including regular conv2d, depthwise conv2d, and grouped conv2d for some data 
and kernel layouts.
+When for regular convolution, use data laout HHWC and kernel layout OHWI. For 
depthwise convolution,
+use data layout data layout is NCHW and kernel layout OIHW."""
 
+from itertools import chain
 import textwrap
+from typing import Iterator, Tuple
 
-from tvm import te, tir
 
-from .common import num_simd_lanes_per_word
+def _get_c_function_name(split_size, dimensions, offsets, x_strides):
+    """Gets the C function name of the tensordot function. We do not need a 
suffix, as the generated
+    function will have an #include guard. Unlike other microTVM operators, 
_get_c_function_name is
+    never called externally."""
+    tensor_w, kernel_h, kernel_w = dimensions
+    return (
+        f"tensordot_opt_x{split_size}_int16_w{tensor_w}_"
+        + f"{kernel_h}x{kernel_w}_"
+        + "".join(map(str, offsets))
+        + (f"_{x_strides[0]}_{x_strides[1]}" if split_size > 1 else "")
+    )
 
 
-def _get_func_name(in_dtype, tensor_h, jump, tensor_w, suffix):
-    """Gets the C function name of the tensordot function."""
-    return f"tensordot_{in_dtype}_h{tensor_h}_j{jump}_w{tensor_w}_{suffix}"
+def _init_biased_accumulators(split_size):
+    """Addition is commutative, so we could add the bias before, during, or 
after performing our
+    multiply-accumulate operations. It "costs" one cycle either way - if done 
at the beginning we
+    can't use a SMULXY trick to set sum_i to zero for "free", and if done at 
the end it doesn't
+    combine with anything. However, doing it at the beginning frees up a 
register/prevents needing

Review Comment:
   The order of bias addition does not change the overflow behavior. This 
comment is just stating we could do the additions as:
   ```math
   A_1 B_1 + A_2 B_2 + \cdots A_n B_n + \text{bias}
   ```
   OR as:
   ```math
   \text{bias} + A_1 B_1 + A_2 B_2 + \cdots A_n B_n
   ```
   I've changed the wording a bit to make this clearer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] guberti commented on a diff in pull request #13242: [microTVM] [WIP] Modernize Arm Cortex-M convolution schedules

Reply via email to