jtuyls commented on a change in pull request #7350:
URL: https://github.com/apache/tvm/pull/7350#discussion_r579214604



##########
File path: docs/deploy/vitis_ai.rst
##########
@@ -541,20 +551,55 @@ TVM.
    import tvm
    import tvm.relay as relay
    from tvm.contrib.target import vitis_ai
-   from tvm.contrib import util, graph_runtime
+   from tvm.contrib import utils, graph_runtime
    from tvm.relay.build_module import bind_params_by_name
    from tvm.relay.op.contrib.vitis_ai import annotation
 
 After importing a convolutional neural network model using the usual
 Relay API's, annotate the Relay expression for the given Vitis-AI DPU
 target and partition the graph.
 
+.. note::
+
+    We recommend converting DPU convolutions' data layouts to NHWC and CPU 
convolutions'
+    data layouts to NCHW for best DPU and out of the box CPU performance. You 
can use the
+    ConvertLayout transformation pass two times to achieve this as 
demonstrated in the code
+    block underneath. You can also leave the CPU convolution layouts in NHWC 
and tune ARM CPU
+    performance for this data layout to avoid the layout transformation 
overheads introduced by
+    executing DPU convolutions in NHWC and CPU convolutions in NCHW
+    (check out the `AutoScheduling 
<https://tvm.apache.org/docs/tutorials/index.html#autoscheduler-template-free-auto-scheduling>`__
+    and `AutoTuning 
<https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_arm.html>`__
+    tutorials for this).
+
 .. code:: python
 
    mod["main"] = bind_params_by_name(mod["main"], params)
+   
+   # For edge DPU we recommend converting the convolutions' data layout
+   #    to NHWC for best performance. Therefore, we first convert the layouts
+   #    of all convolutions to NHWC before partitioning. Afterwards, we can
+   #    convert any remaining convolutions (to be executed on CPU) back to 
NCHW.
+   desired_layouts = {'nn.conv2d': ['NHWC', 'default']}
+   seq = tvm.transform.Sequential([relay.transform.RemoveUnusedFunctions(),
+                                   
relay.transform.ConvertLayout(desired_layouts),
+                                   relay.transform.FoldConstant()])
+   with tvm.transform.PassContext(opt_level=3):
+       mod = seq(mod)
+            
+   # Annotate and partition the Relay expression for the given target
    mod = annotation(mod, params, target)
    mod = relay.transform.MergeCompilerRegions()(mod)
    mod = relay.transform.PartitionGraph()(mod)
+   
+   # After partitioning we recommend transforming the remaining convolutions
+   #    (that will be executed on CPU, if any) back to NCHW data layout
+   #    for best CPU performance
+   desired_layouts = {'nn.conv2d': ['NCHW', 'default']}
+   seq = tvm.transform.Sequential([relay.transform.RemoveUnusedFunctions(),

Review comment:
       I am not sure. I just took this from the ConvertLayout documentation a 
while ago. It seems like it's still mentioned there: 
https://tvm.apache.org/docs/dev/convert_layout.html?highlight=layout_transform. 
But if it's not necessary we can remove it here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to