guberti commented on code in PR #12969: URL: https://github.com/apache/tvm/pull/12969#discussion_r988798528
########## python/tvm/topi/arm_cpu/mprofile/dsp/tensordot_conv2ds.py: ########## @@ -0,0 +1,276 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +"""Implementations of several conv2d variations, all tensorized using tensordot and optimized for +Cortex-M DSP. Currently contains a standard conv2d and depthwise conv2d implementation, but could be +extended to add a grouped conv2d operator. Due to the way we tensorize, this schedule ONLY works +when the data and kernel layouts are NCHWxc and OIHWxi respectively, where x is the number of +input channels divided by the number of groups.""" + +import random +import string +from typing import Union, Tuple + +from tvm import te +from tvm.tir import indexdiv, indexmod +from tvm.topi.utils import traverse_inline +from tvm.topi.nn.pad import pad + +from .micro_kernel.tensordot import ( + make_intrin_tensordot, + tensordot_impl, +) + + +def _unpack_2d_argument(argument: Union[int, Tuple]) -> Tuple: + if isinstance(argument, int): + return (argument, argument) + assert len(argument) == 2 + return argument + + +def _check_no_dilation(dilation: Union[int, Tuple]) -> None: + """Takes a dilation argument as an integer or tuple, and makes sure both dimensions are 1. + Dilation prevents us from using DSP instructions, so this schedule can't work (aside from the + niche case where dilation_h == stride_h and dilation_w == stride_w, which is rare enough we + probably don't need to support it).""" + + dilation_h, dilation_w = _unpack_2d_argument(dilation) + assert dilation_h == dilation_w == 1 + + +def _unpack_padding(padding: Tuple) -> Tuple: + assert isinstance(padding, tuple) + if len(padding) == 2: + (pad_up, pad_down), (pad_left, pad_right) = padding + else: + pad_up, pad_left, pad_down, pad_right = padding + return pad_up, pad_left, pad_down, pad_right + + +def _pad_if_needed(data: te.tensor.Tensor, layout: str, padding: Tuple) -> te.tensor.Tensor: + """Performs padding on a te.tensor.Tensor object if necessary. If padding = (0, 0, 0, 0), the + input tensor is returned unmodified. We only care about tuples here - "VALID" and "SAME" padding + will be converted by the importer TFLite importer if present.""" + + pad_up, pad_left, pad_down, pad_right = padding + if not any(padding): + return data + + # We want to pad the "H" and "W" columns, and their position depends on the layout + pad_before, pad_after = [0, 0, 0, 0], [0, 0, 0, 0] + pad_before[layout.index("H")] = pad_up + pad_before[layout.index("W")] = pad_left + pad_after[layout.index("H")] = pad_down + pad_after[layout.index("W")] = pad_right + return pad(data, pad_before, pad_after, name="padded_data") + + +def _compute_output_dim(data_dim, kernel_dim, pad_before, pad_after, stride) -> int: + return (data_dim - kernel_dim + pad_before + pad_after) // stride + 1 + + +def _compute_offset(data_dim, kernel_dim, pad_before, pad_after, stride) -> int: + """Computes offsets to "prefer" the bottom right corner. This is done to match TensorFlow's Review Comment: An example might be helpful to see the difference between the Relay and TensorFlow behaviors. Consider the case where we are trying to convolve an unpadded 6x6 matrix (left) by a 3x3 kernel (right) with `stride = 2`. <p align="center"> <img src="https://user-images.githubusercontent.com/3069006/194267005-49ef4902-9d66-4429-be77-3808307221f2.png"> </p> Since our stride is `2` and the matrix will not be padded, our output will be a `2*2` matrix, TVM would do the computation as follows: <p align="center"> <img src="https://user-images.githubusercontent.com/3069006/194270615-3997e5b7-f9d5-4ee5-a19c-d0a25099ff12.png"> </p> However, when there is a doubt, TensorFlow and TFLite prefer to push the kernel towards the bottom right of the tensor. Hence, TFLite would do this computation as: <p align="center"> <img src="https://user-images.githubusercontent.com/3069006/194273215-33c0cfdf-70d5-4c2d-bdcc-29480c056a16.png"> </p> Neither convention is inherently better, and a difference only exists in some cases (where `stride > 1`, the tensor is not padded (because correct padding avoids this ambiguity) and tensor width minus kernel width is not divisible by the stride). And even when a difference does exist, it doesn't affect accuracy _too_ much, as conv2Ds tend to be resistant to translation. However, some models are sensitive to this. For example, many audio models use conv2D operations on spectrogram inputs (where the axes are pitch and time), and changing the convention here is akin to shifting the pitch of the input data by one "notch". Audio models also have unusually small input dimensions (e.g. 8 by 25), which makes the problem worse. microTVM has had accuracy issues for a while when importing audio models from TFLite, and I believe this convention is part of the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
