[GitHub] [incubator-tvm] anijain2305 commented on a change in pull request #4664: [Docs] Convert Layout pass.

GitBox Mon, 13 Jan 2020 09:23:16 -0800

anijain2305 commented on a change in pull request #4664: [Docs] Convert Layout 
pass.
URL: https://github.com/apache/incubator-tvm/pull/4664#discussion_r365927585


 ##########
 File path: docs/dev/convert_layout.rst
 ##########
 @@ -0,0 +1,238 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+..    http://www.apache.org/licenses/LICENSE-2.0
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+===================
+Convert Layout Pass
+===================
+**Author**: `Animesh Jain <https://github.com/anijain2305>`_
+
+*************
+1. Background
+*************
+
+Data layout format describes how the data is laid out in the memory. For 
example, Tensorflow framework default data layout for convolution operator is 
NHWC, i.e, the data is 4-dimensions and is laid out in row-major format with N 
being the first dimension and C being the last dimension. Data layout has a 
major role in model performance, significantly affecting spatial and temporal 
locality. For example, Intel x86 backend in TVM prefers layout as NCHWc where 
the C dimension is tiled in 2 dimensions to exploit data locality efficiently. 
Similarly, CUDA backend prefers the data layout to be in NCHW format.
+
+Essentially, TVM has to deal with data layouts throughout the compiler 
toolchain - Framework parsers, Relay layout transformations, and TOPI 
schedules. As we move towards third-party codegen integration, which might have 
their own data layout restrictions, handling layouts at all levels in TVM 
toolchain is going to become even more challenging. Therefore, we developed a 
new Relay pass - **ConvertLayout** -- to reduce some of the complications that 
arise due to layout handling.
+
+If you directly want to understand the usage of ConvertLayout Pass, directly 
jump to Section 4 - Usage.
+
+**************************
+2. Motivation and Overview
+**************************
+
+Let's look at a simple scenario to understand the complications that arise due 
to different layouts - Suppose we want to compile a Tensorflow NHWC graph for 
an ARM edge device. But, suppose we currently support only NCHW schedules in 
TOPI for ARM. So, there is a mismatch between framework layout and 
TOPI-supported layout. One way to deal with this mismatch is to insert layout 
transforms before each and after convolution, such that resulting convolution 
has NCHW input data layout and can use TOPI schedules. However, this can lead 
to performance degradation because of the presence of too many layout 
transforms.
+
+We encountered similar problems in other use cases as well
+
+- No way to run TFLite graphs on Nvidia GPUs. TOPI has NCHW-only schedules for 
GPUs.
+- Ever-complicating logic in AlterOpLayout for convolution to support 
different pairs of layout transformations.
+- Sub-optimal performance for TF graphs due to extra layout transforms.
+- Complication in third-party codegen integrations like TensorRT that prefers 
data layout to be in one format.
+
+To solve these problems, we introduced *ConvertLayout* pass that sets up the 
infrastructure to change the data layout of the whole graph with minimal number 
of data layout transforms. In ideal cases, we will have only 2 layout 
transforms for data, one at the start and one at the end. An example to show 
the transformation is below
+
+
+.. code-block:: python
+
+       # Original graph - 2 convolutions in NHWC format.
+       fn (%x: Tensor[(1, 56, 56, 64), float32], %weight1: Tensor[(3, 3, 64, 
32), float32], %weight2: Tensor[(3, 3, 32, 32), float32]) {
+         %0 = nn.conv2d(%x, %weight1, padding=[1, 1], channels=32, 
kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO");
+         %1 = nn.relu(%0);
+         %2 = nn.conv2d(%1, %weight2, padding=[1, 1], channels=32, 
kernel_size=[3, 3], data_layout="NHWC", kernel_layout="HWIO");
+         nn.relu(%2)
+       }
+
+       # After ConvertLayout - For data, there is a transform at the start and 
at the end.
+       # For weights, there are transforms to adapt to NCHW layout. These will 
be removed with FoldConstant pass.
+       fn (%x: Tensor[(1, 56, 56, 64), float32], %weight1: Tensor[(3, 3, 64, 
32), float32], %weight2: Tensor[(3, 3, 32, 32), float32]) {
+         %0 = layout_transform(%x, src_layout="NHWC", dst_layout="NCHW") /* 
ty=Tensor[(1, 64, 56, 56), float32] */;
+         %1 = layout_transform(%weight1, src_layout="HWIO", dst_layout="OIHW") 
/* ty=Tensor[(32, 64, 3, 3), float32] */;
+         %2 = nn.conv2d(%0, %1, padding=[1, 1], channels=32, kernel_size=[3, 
3]) /* ty=Tensor[(1, 32, 56, 56), float32] */;
+         %3 = nn.relu(%2) /* ty=Tensor[(1, 32, 56, 56), float32] */;
+         %4 = layout_transform(%weight2, src_layout="HWIO", dst_layout="OIHW") 
/* ty=Tensor[(32, 32, 3, 3), float32] */;
+         %5 = nn.conv2d(%3, %4, padding=[1, 1], channels=32, kernel_size=[3, 
3]) /* ty=Tensor[(1, 32, 56, 56), float32] */;
+         %6 = nn.relu(%5) /* ty=Tensor[(1, 32, 56, 56), float32] */;
+         layout_transform(%6, src_layout="NCHW", dst_layout="NHWC") /* 
ty=Tensor[(1, 56, 56, 32), float32] */
+       }
+
+
+*********
+3. Design
+*********
+
+Before delving into ConvertLayout pass, let's categorize the operators into 3 
categories based on their sensitivity to data layouts. This categorization will 
be useful later to understand Convertlayout pass details.
+
+- **Layout agnostic** - Relu, pow etc. These operators are not affected, 
neither functionality nor performance, by data layouts.
+- **Lightly-layout sensitive** - pad, concatenate, reduce ops like sum etc. 
These operators have some attributes that are functionally affected if we do a 
layout transformation before them. However, performance-wise, the difference is 
not significant. For these operators, it is beneficial to just adapt to the 
previous operator output data layout.
+- **Heavily-layout sensitive** - Convolution, conv2d_transpose etc. These 
operators are heavily affected, both functionally and performance-wise, by data 
layouts. They also have data layout as the op attribute. Typically, it is 
beneficial to modify the input data layouts for these operators (if its not a 
performant data layout), while the rest of *layout agnostic* and 
*lightly-layout sensitive* operators adapt to the layout governed by the output 
of these *heavliy-layout sensitive* operators.
+
+
+Let us now look at two relevant Relay operator properties. Each relay operator 
has properties, like InferType, that can be defined by a TVM developer. 
Typically, a Relay pass traverses the graph operator-by-operator and reads 
these operator properties. For example, InferType pass looks at the InferType 
property of on operator, determines its output shape and type, and then passes 
it to the next operator InferType property. Similarly, in our context, we have 
2 such properties - *FTVMConvertLayout* and *FInferCorrectLayout*. 
ConvertLayout pass traverses the graph and looks at these 2 properties along 
with an automatic layout transform insertion module to handle data layouts. So, 
the whole process can be broken down into 3 steps:
+
+- Run FTVMConvertLayout property - This allows the developers to transform the 
original Relay expr into a new Relay expr with new layouts, allowing 
user-defined layout alteration. There is a python callback for developer's 
ease. This is used only for heavily-layout sensitive operators.
+- Run FTVMInferCorretLayout property - We can view this as layout inference. 
It looks at the original input layout and the new input layouts, which are 
either coming from previous operator or from the FTVMConvertLayout modified 
expr (if it was used). This can be used by lightly-layout sensitive operators 
to adapt its attributes to new data layouts. Layout inference happens for each 
operator.
+- Automatic insertion of layout transforms - The previos step - layout 
inference - sets the new layout for the input exprs. If these layouts are 
different from the original layouts, then this component automatically inserts 
a layout transform. Therefore, a developer does not need to do anything for 
this component.
 
 Review comment:
   previos -> previous

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-tvm] anijain2305 commented on a change in pull request #4664: [Docs] Convert Layout pass.

Reply via email to