ekalda commented on code in PR #12969: URL: https://github.com/apache/tvm/pull/12969#discussion_r988867814
########## python/tvm/topi/arm_cpu/mprofile/dsp/tensordot_conv2ds.py: ########## @@ -0,0 +1,276 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""Implementations of several conv2d variations, all tensorized using tensordot and optimized for +Cortex-M DSP. Currently contains a standard conv2d and depthwise conv2d implementation, but could be +extended to add a grouped conv2d operator. Due to the way we tensorize, this schedule ONLY works +when the data and kernel layouts are NCHWxc and OIHWxi respectively, where x is the number of +input channels divided by the number of groups.""" + +import random +import string +from typing import Union, Tuple + +from tvm import te +from tvm.tir import indexdiv, indexmod +from tvm.topi.utils import traverse_inline +from tvm.topi.nn.pad import pad + +from .micro_kernel.tensordot import ( + make_intrin_tensordot, + tensordot_impl, +) + + +def _unpack_2d_argument(argument: Union[int, Tuple]) -> Tuple: + if isinstance(argument, int): + return (argument, argument) + assert len(argument) == 2 + return argument + + +def _check_no_dilation(dilation: Union[int, Tuple]) -> None: + """Takes a dilation argument as an integer or tuple, and makes sure both dimensions are 1. + Dilation prevents us from using DSP instructions, so this schedule can't work (aside from the + niche case where dilation_h == stride_h and dilation_w == stride_w, which is rare enough we + probably don't need to support it).""" + + dilation_h, dilation_w = _unpack_2d_argument(dilation) + assert dilation_h == dilation_w == 1 + + +def _unpack_padding(padding: Tuple) -> Tuple: + assert isinstance(padding, tuple) + if len(padding) == 2: + (pad_up, pad_down), (pad_left, pad_right) = padding + else: + pad_up, pad_left, pad_down, pad_right = padding + return pad_up, pad_left, pad_down, pad_right + + +def _pad_if_needed(data: te.tensor.Tensor, layout: str, padding: Tuple) -> te.tensor.Tensor: + """Performs padding on a te.tensor.Tensor object if necessary. If padding = (0, 0, 0, 0), the + input tensor is returned unmodified. We only care about tuples here - "VALID" and "SAME" padding + will be converted by the importer TFLite importer if present.""" + + pad_up, pad_left, pad_down, pad_right = padding + if not any(padding): + return data + + # We want to pad the "H" and "W" columns, and their position depends on the layout + pad_before, pad_after = [0, 0, 0, 0], [0, 0, 0, 0] + pad_before[layout.index("H")] = pad_up + pad_before[layout.index("W")] = pad_left + pad_after[layout.index("H")] = pad_down + pad_after[layout.index("W")] = pad_right + return pad(data, pad_before, pad_after, name="padded_data") + + +def _compute_output_dim(data_dim, kernel_dim, pad_before, pad_after, stride) -> int: + return (data_dim - kernel_dim + pad_before + pad_after) // stride + 1 + + +def _compute_offset(data_dim, kernel_dim, pad_before, pad_after, stride) -> int: + """Computes offsets to "prefer" the bottom right corner. This is done to match TensorFlow's Review Comment: The numerical alignment question is really interesting - AFAIK TVM doesn't claim to implement any numerical standard, the accuracy requirement is essentially at the level of "if it says in the end that it's a cat, it's good enough". That's true though for the whole ML compilation landscape, especially for server workloads. It's a bit unfortunate, but that's where ML compilation is at the moment :) Embedded ML compilation is starting to feel the burn of the absence of standard a bit more though since it is becoming relevant to functional safety, where bit accuracy is important. TVM's lack of numerical standard has given us a lot of headache in the past and on few occasions questions around whether we need separate Relay op implementations/compute definitions to match the behaviours of various frameworks has been on the table. I suppose we have to do it at one point, probably out of scope for this patch though. I personally think that matching the behaviour of Cortex-M schedules to TFLite is not a bad idea because * Most embedded models nowadays still come in TFLite * There is no specific reason why the current schedules produce the the output they do (correct me if that is not right) * If conv2d is matching TFLite behaviour, it is also TOSA compliant, which means it aligns with at least one approved and maintained standard But again, as long as TVM is at BYONumericalFormat, it's really up to you which behaviour you want to implement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
