[GitHub] [tvm-rfcs] mbaret commented on a change in pull request #37: Introduce the Arm(R) Ethos(TM)-U Cascading Planner

GitBox Tue, 02 Nov 2021 09:57:45 -0700


mbaret commented on a change in pull request #37:
URL: https://github.com/apache/tvm-rfcs/pull/37#discussion_r741292684




##########
File path: rfcs/0037-arm-ethosu-cascading-planner.md
##########
@@ -41,51 +41,17 @@ Deciding on exactly which operators should be cascaded and 
with what striping pa
 
 They key piece of information to calculate in order to characterize a cascade 
is how the stripe size changes throughout. This is a function of the data 
dependency between an operator's inputs and outputs. For many operators that 
we're interested in, an affine transform matrix can be used to represent this 
dependency if we represent the input and output stripe sizes as vectors. Affine 
transforms typically consider 'augmented' matrices and vectors 
(https://en.wikipedia.org/wiki/Affine_transformation#Augmented_matrix) which 
allow for the representation of constant changes. Concretely, we define the 
transform matrix M as being the matrix for which the following holds:
 
-$$stripe_{in} = {M} \cdot {stripe_{out}}$$
+![meta-schedule-workflow](../resources/cascading-formula-1.png)
 
 Let's briefly consider how to derive such a transform matrix for a 3x3 
unstrided, undilated and unpadded NHWC convolution. Immediately, the '3x3' 
kernel tells us something important: a single element in the output depends on 
3x3 elements in the height/width of the input. If we were instead to consider a 
2x2 region of the output in the height/width dimensions, we'd then need a 4x4 
region in the input. So in general, the rule is that we need 2 more elements in 
height and width when calculating the dependencies of an output stripe. It can 
be shown that more generally this number is the kernel_size-1 in each axis. Now 
to consider the channels, in a convolution no matter how many output elements 
you are computing you'll always need every input channel. This is because the 
input channel axis is a reduction axis in a convolution, in a sense it isn't 
'reflected' in the output. Combining these two observations, we arrive at the 
following transform matrix:

Review comment:
       In a depthwise you're right that this will be different, but depthwise 
will also use a different transform matrix. So here we're just working out the 
transform matrix for a standard 2D convolution.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] mbaret commented on a change in pull request #37: Introduce the Arm(R) Ethos(TM)-U Cascading Planner

Reply via email to