[GitHub] [tvm] ekalda commented on a diff in pull request #12969: [microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts

GitBox Thu, 06 Oct 2022 03:32:42 -0700


ekalda commented on code in PR #12969:
URL: https://github.com/apache/tvm/pull/12969#discussion_r988867814



##########
python/tvm/topi/arm_cpu/mprofile/dsp/tensordot_conv2ds.py:
##########
@@ -0,0 +1,276 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+"""Implementations of several conv2d variations, all tensorized using 
tensordot and optimized for
+Cortex-M DSP. Currently contains a standard conv2d and depthwise conv2d 
implementation, but could be
+extended to add a grouped conv2d operator. Due to the way we tensorize, this 
schedule ONLY works
+when the data and kernel layouts are NCHWxc and OIHWxi respectively, where x 
is the number of
+input channels divided by the number of groups."""
+
+import random
+import string
+from typing import Union, Tuple
+
+from tvm import te
+from tvm.tir import indexdiv, indexmod
+from tvm.topi.utils import traverse_inline
+from tvm.topi.nn.pad import pad
+
+from .micro_kernel.tensordot import (
+    make_intrin_tensordot,
+    tensordot_impl,
+)
+
+
+def _unpack_2d_argument(argument: Union[int, Tuple]) -> Tuple:
+    if isinstance(argument, int):
+        return (argument, argument)
+    assert len(argument) == 2
+    return argument
+
+
+def _check_no_dilation(dilation: Union[int, Tuple]) -> None:
+    """Takes a dilation argument as an integer or tuple, and makes sure both 
dimensions are 1.
+    Dilation prevents us from using DSP instructions, so this schedule can't 
work (aside from the
+    niche case where dilation_h == stride_h and dilation_w == stride_w, which 
is rare enough we
+    probably don't need to support it)."""
+
+    dilation_h, dilation_w = _unpack_2d_argument(dilation)
+    assert dilation_h == dilation_w == 1
+
+
+def _unpack_padding(padding: Tuple) -> Tuple:
+    assert isinstance(padding, tuple)
+    if len(padding) == 2:
+        (pad_up, pad_down), (pad_left, pad_right) = padding
+    else:
+        pad_up, pad_left, pad_down, pad_right = padding
+    return pad_up, pad_left, pad_down, pad_right
+
+
+def _pad_if_needed(data: te.tensor.Tensor, layout: str, padding: Tuple) -> 
te.tensor.Tensor:
+    """Performs padding on a te.tensor.Tensor object if necessary. If padding 
= (0, 0, 0, 0), the
+    input tensor is returned unmodified. We only care about tuples here - 
"VALID" and "SAME" padding
+    will be converted by the importer TFLite importer if present."""
+
+    pad_up, pad_left, pad_down, pad_right = padding
+    if not any(padding):
+        return data
+
+    # We want to pad the "H" and "W" columns, and their position depends on 
the layout
+    pad_before, pad_after = [0, 0, 0, 0], [0, 0, 0, 0]
+    pad_before[layout.index("H")] = pad_up
+    pad_before[layout.index("W")] = pad_left
+    pad_after[layout.index("H")] = pad_down
+    pad_after[layout.index("W")] = pad_right
+    return pad(data, pad_before, pad_after, name="padded_data")
+
+
+def _compute_output_dim(data_dim, kernel_dim, pad_before, pad_after, stride) 
-> int:
+    return (data_dim - kernel_dim + pad_before + pad_after) // stride + 1
+
+
+def _compute_offset(data_dim, kernel_dim, pad_before, pad_after, stride) -> 
int:
+    """Computes offsets to "prefer" the bottom right corner. This is done to 
match TensorFlow's

Review Comment:
   The numerical alignment question is really interesting - AFAIK TVM doesn't 
claim to implement any numerical standard, the accuracy requirement is 
essentially at the level of "if it says in the end that it's a cat, it's good 
enough". That's true though for the whole ML compilation landscape, especially 
for server workloads. It's a bit unfortunate, but that's where ML compilation 
is at the moment :) Embedded ML compilation is starting to feel the burn of the 
absence of standard a bit more though since it is becoming relevant to 
functional safety, where bit accuracy is important.
   
   TVM's lack of numerical standard has given us a lot of headache in the past 
and on few occasions questions around whether we need separate Relay op 
implementations/compute definitions to match the behaviours of various 
frameworks has been on the table. I suppose we have to do it at one point, 
probably out of scope for this patch though. 
   
   I personally think that matching the behaviour of Cortex-M schedules to 
TFLite is not a bad idea because
   * Most embedded models nowadays still come in TFLite
   * There is no specific reason why the current schedules produce the the 
output they do (correct me if that is not right)
   * If conv2d is matching TFLite behaviour, it is also TOSA compliant, which 
means it aligns with at least one approved and maintained standard 
   
   But again, as long as TVM is at BYONumericalFormat, it's really up to you 
which behaviour you want to implement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] ekalda commented on a diff in pull request #12969: [microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts

Reply via email to