adstraw commented on a change in pull request #9390:
URL: https://github.com/apache/tvm/pull/9390#discussion_r743876444



##########
File path: tests/python/contrib/test_hexagon/test_conv2d_conv2d.md
##########
@@ -0,0 +1,860 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+Hexagon conv2d -> conv2d schedules
+
+# Baseline conv2d -> conv2d
+
+This is a baseline 1x1 conv2d -> 1x1 conv2d schedule for Hexagon.
+
+## Command
+
+pytest -sv 
"tests/python/contrib/test_hexagon/test_conv2d_conv2d.py::TestConv2dConv2dPackedFilter::test_conv2d[1-64-128-0-1-1-128-1-1-128-1-1-float32-llvm]"
+
+## Parameters
+
+| Parameter                | Value |
+| ------------------------ | ----- |
+| Batch                    | 1     |
+| Input Size               | 64x64 |
+| Input Channel            | 128   |
+| Conv2d #1 Pad            | 0     |
+| Conv2d #1 Stride         | 1     |
+| Conv2d #1 Kernel Size    | 1     |
+| Conv2d #1 Output Channel | 128   |
+| Conv2d #2 Stride         | 1     |
+| Conv2d #2 Kernel Size    | 1     |
+| Conv2d #2 Output Channel | 128   |
+| k_split                  | 1     |
+| h_split                  | 1     |
+
+## Constants
+
+| Constant           | Value |
+| ------------------ | ----- |
+| Conv2d #2 Pad      | 0     |
+| Conv2d #1 Dilation | 1     |
+| Conv2d #2 Dilation | 1     |
+
+## Shapes and Layouts
+
+The input is provided and padded in logical layout and then packed into its 
physical layout prior to compute.  Logical layout / shape information is 
provided as a reference for phsyical tensors.
+
+| Tensor       | Type     | Layout      | Shape                  | Logical 
Layout | Logical Shape    |
+| ------------ | -------- | ----------- | ---------------------- | 
-------------- | ---------------- |
+| Input        | Logical  | NHWC        | [1, 64, 64, 128]       |             
   |                  |
+| Padded Input | Logical  | NHWC        | [1, 64, 64, 128]       |             
   |                  |
+| Packed Input | Physical | NHWC8h8w32c | [1, 8, 8, 4, 8, 8, 32] | NHWC        
   | [1, 64, 64, 128] |
+| Filter 1     | Physical | OIHW8i32o4i | [4, 4, 1, 1, 8, 32, 4] | OIHW        
   | [128, 128, 1, 1] |
+| Temp Output  | Physical | NHWC8h8w32c | [1, 8, 8, 4, 8, 8, 32] | NHWC        
   | [1, 64, 64, 128] |
+| Filter 2     | Physical | OIHW8i32o4i | [4, 4, 1, 1, 8, 32, 4] | OIHW        
   | [128, 128, 1, 1] |
+| Output       | Physical | NHWC8h8w32c | [1, 8, 8, 4, 8, 8, 32] | NHWC        
   | [1, 64, 64, 128] |
+
+## Schedule
+
+This is the conv2d compute schedule:
+
+```
+  for (ko.outer: int32, 0, 4) {
+    for (ho.outer: int32, 0, 8) {
+      // caches computed here
+      for (wo.c: int32, 0, 8) {
+        for (rc.outer_1: int32, 0, 4) {
+          for (hi.c: int32, 0, 8) {
+            for (wi.c: int32, 0, 8) {
+              for (ki.c: int32, 0, 32) {
+                for (rc.inner_1: int32, 0, 32) {
+```
+
+Note that conv2d #1 has an independent loop over the channel out `ko.outer` 
dimension.  This is because the output channels of conv2d #1 are the input 
channels to conv2d #2 and we compute over all input channels for each conv2d so 
we must compute over all output channels of conv2d #1 before we compute conv2d 
#2.
+
+```
+      for (ko.outer_1: int32, 0, 2) {
+```
+
+## Cache Usage
+
+*Input Cache*
+
+We compute over the WC8h8w32c portion of the input so we need 8 * 4 * 8 * 8 * 
32 = 64kb for the input cache.
+
+```
+  allocate(packed_input.global: Pointer(global float32), float32, [65536]), 
storage_scope = global;
+```
+
+*Filter Cache*
+
+We compute over the IHW8i32o4i portion of each filter so we need 4 * 1 * 1 * 8 
* 32 * 4 = 4kb filter cache.
+
+```
+  allocate(packed_filter.global: Pointer(global float32), float32, [4096]), 
storage_scope = global;
+```
+
+Note that there is just one cache which is reused for conv2d / filter #1 and 
conv2d / filter #2.
+
+*Output Cache*
+
+We compute over the WK8h832k portion of the output where `k` denotes the 
output channel.  The output cache is computed for each `ko.outer` which means 
it should be W * 8h * 8w * 32k = 8 * 8 * 8 * 32 = 16kb.  And, in fact, this is 
the case for a single conv2d case.   But, as already noted, for this conv2d -> 
conv2d case "the output channels of conv2d #1 are the input channels to conv2d 
#2 and we compute over all input channels for each conv2d so we must compute 
over all output channels of conv2d #1 before we compute conv2d #2".  This means 
that the output cache must grow accordingly to K * W * 8h * 8w * 32k = 4 * 8 * 
8 * 8 * 32 = 64kb.  There is a temporary allocation to store the results of 
conv2d #1:
+
+```
+  allocate(temp_output: Pointer(global float32), float32, [65536]), 
storage_scope = global;
+```
+
+Note that the input cache is reused to store the results of conv2d #2.

Review comment:
       Added the TODO in the README.  Will add to the backlog as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to