[tvm] branch main updated: [Adreno] Add documentation for Adreno deployment (#13393)

echuraev Mon, 28 Nov 2022 00:26:07 -0800

This is an automated email from the ASF dual-hosted git repository.

echuraev pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git



The following commit(s) were added to refs/heads/main by this push:
     new c38a0c50a7 [Adreno] Add documentation for Adreno deployment (#13393)
c38a0c50a7 is described below

commit c38a0c50a7cde09d548f570f7aafa8e293ef1485
Author: dsbarinov1 <[email protected]>
AuthorDate: Mon Nov 28 11:25:55 2022 +0300

    [Adreno] Add documentation for Adreno deployment (#13393)
    
    * [Adreno] Add documentation for Adreno deployment
    Purpose:
    assist TVM users compile and deploy on Adreno by expanding our 
documentation and providing sample scripts in TVM.
    
    Information about PR:
    The present PR consists globally of 3 parts.
    
    The first part is an introductory article on compilation and deployment of 
neural networks on Adreno, covering such topics as: «Build TVM for 
Android/Adreno», «Advantages of textures» and «Differences in compilation and 
deployment of models for Adreno devices».
    
    The second part is a straightforward example script for compiling and 
inferring models at different precisions for Adreno devices.
    
    The third part is auxiliary files, images, etc.
    
    * Add correct links to images + small fixes
    
    * Remove images (.png)
    
    * Add request_hook in deploy_model_on_adreno.py
    
    * Fix trailing newline + add license
    
    * No newline at the EOF + blanks
    
    * Fix request hook placing
    
    * Fix style
    
    * Fix trailing
    
    * Fix whitespaces
    
    * Fix whitespaces v2
    
    * Add newline at adreno.rst EOF
    
    * Add license to adreno.rst
    
    * Remove sphinx 'autosectionlabel' extension + modify cross-references in 
docs to work without this extension
    
    * Set default values to tracker_host and tracker_port
    
    * Add local_demo to be able to autogenerate docs
    
    * Fix quotes
    
    * Fix benchmark
    
    * .
---
 docs/how_to/deploy/adreno.rst                      | 336 ++++++++++++++++++++
 docs/how_to/deploy/index.rst                       |   1 +
 .../how_to/deploy_models/deploy_model_on_adreno.py | 351 +++++++++++++++++++++
 3 files changed, 688 insertions(+)

diff --git a/docs/how_to/deploy/adreno.rst b/docs/how_to/deploy/adreno.rst
new file mode 100644
index 0000000000..af613aa5cb
--- /dev/null
+++ b/docs/how_to/deploy/adreno.rst
@@ -0,0 +1,336 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Deploy to Adreno GPU
+=======================================
+
+**Authors**: Daniil Barinov, Egor Churaev, Andrey Malyshev
+
+Introduction
+------------
+
+Adreno is a series of graphics processing unit (GPU) semiconductor
+intellectual property cores developed by Qualcomm and used in many of
+their SoCs.
+
+The Adreno GPU accelerates the rendering of complex geometries to
+deliver high-performance graphics and a rich user experience with low
+power consumption.
+
+This guide will demonstrate :ref:`the benefits of using textures with 
Adreno<advantages_of_the_textures>`,
+how to :ref:`build TVM with OpenCL<building_tvm_for_adreno>` (needed by Adreno 
devices) and TVM RPC
+enabled. It will also provide :ref:`example 
code<build_and_deploy_model_for_adreno>` to better understand the differences 
in compiling and deploying models
+for Adreno devices.
+
+.. _advantages_of_the_textures:
+
+Advantages of the Textures
+--------------------------
+
+One of the Adreno's advantages is the clever handling of textures. At
+the moment, TVM is able to benefit from this by having texture support
+for Adreno. The graph below shows the Adreno A5x architecture.
+
+|High-level overview of the Adreno A5x architecture for OpenCL|
+
+*Fig. 1 High-level overview of the Adreno A5x architecture for OpenCL*
+
+*source:* `OpenCL Optimization and Best Practices for Qualcomm Adreno GPUs 
<https://dl.acm.org/doi/10.1145/3204919.3204935>`_
+
+Reasons of using textures:
+
+-  Texture processor (TP) has a dedicated L1 cache, which is read-only cache 
and stores data
+   fetched from level-2 (L2) cache for texture operations (primary
+   reason)
+
+-  The handling of image boundaries is built-in.
+
+-  Supports numerous image format and data type combinations with
+   support for automatic format conversions
+
+Overall, with textures, it is possible to achieve a significant performance 
boost
+compared to OpenCL buffer based solutions.
+
+.. _building_tvm_for_adreno:
+
+Building TVM for Adreno
+-----------------------
+
+This section gives instructions on how to build the Android part of TVM
+with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+
+Since the process of building TVM for Adreno is exactly the same as the
+process of building TVM for Android, please refer to these instructions:
+`TVM RPC
+Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+
+Since there are many required packages for Android, you can use the official 
Docker Image to build TVM.
+For more information refer to this guide: `Deploy the Pretrained Model on 
Android 
<https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+
+**Prerequisites**: Android NDK and Android Debug Bridge must
+be installed, the desired device must have OpenCL support and Android part of 
TVM must be built:
+
+- Read documentation about *Android NDK installation* here: 
https://developer.android.com/ndk
+- To get access to adb tools you can see *Android Debug Bridge installation* 
here: https://developer.android.com/studio/command-line/adb
+
+You can also build the android part of TVM locally. From the root
+folder of TVM:
+
+::
+
+   mkdir build_android
+   cd build_android
+   cmake .. -DUSE_OPENCL=ON -DUSE_MICRO=OFF 
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake 
-DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 
-DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
+   make -jN tvm_runtime tvm_rpc
+
+where **N** is the number of cores available on your *CPU*.
+
+At this stage you have built TVM for Adreno.
+
+.. _build_and_deploy_model_for_adreno:
+
+Build and deploy model for Adreno
+---------------------------------
+
+In this section we will focus on target, needed to compile and deploy models 
for Adreno, demonstrate
+the differences in generated kernels with and without textures and, in 
addition, the
+possibility of choosing a different precision for model compilation will
+be considered.
+
+For the complete step-py-step process of compiling and deploying models on
+Adreno, including selection of precision, running the inference of the
+model, getting the predictions, and measuring the performance please refer to 
this tutorial: `How To Deploy model on Adreno 
<https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+
+|Android deployment pipeline|
+
+*Fig.2 Deployment pipeline on Adreno devices*
+
+The figure above demonstrates a generalized pipeline for deploying and running 
neural network models on android devices.
+As can be seen from the figure, the compiled model has a set_input() and a 
run() methods,
+which *prepare the inputs* for inference and *execute the inference* on the 
remote device using the Graph Executor runtime module.
+
+Adreno target
+~~~~~~~~~~~~~
+
+Normally, when compiling models for Android using OpenCL, the
+corresponding target is used
+
+.. code:: python
+
+   target="opencl"
+
+Using Adreno, we want to get all the benefits of textures, so we have to
+use the following target to generate texture leveraging kernels
+
+.. code:: python
+
+   target="opencl -device=adreno"
+
+Let's write a simple model with one convolutional (conv2d) layer and take a 
look at generated kernels for these
+two targets
+
+.. code:: python
+
+   import tvm
+   from tvm import relay
+   import numpy as np
+
+   input_shape=(1, 56, 56, 32)
+   filter_shape=(3, 3, 32, 64)
+   filter = np.random.rand(*filter_shape)
+
+   dtype="float32"
+   input = tvm.relay.var("input", shape=input_shape, dtype=dtype)
+   weight = tvm.relay.var("weight", shape=filter_shape, dtype=dtype)
+   D = relay.nn.conv2d(input, weight, padding=(1, 1), data_layout="NHWC", 
kernel_layout="HWIO", out_dtype=dtype)
+
+   mod = relay.Function([input, weight], D)
+   params = {
+      "weight": tvm.nd.array(filter)
+   }
+
+Now compile our model with the classic OpenCL target and print its modules:
+
+.. code:: python
+
+   target="opencl"
+
+   with tvm.transform.PassContext(opt_level=3):
+      graph, lib, params = relay.build_module.build(mod, target, params=params)
+   print(lib.imported_modules[0].get_source())
+
+Notice that the generated convolution kernel has pointers in
+the initialization of the function. The kernels generated with the above 
target are buffer-based.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* 
restrict p0, __global double* restrict p1, __global float* restrict 
conv2d_nhwc) {
+   // body..
+
+
+Now take a look at “opencl -device=adreno” target:
+
+.. code:: python
+
+   target="opencl -device=adreno"
+
+   with tvm.transform.PassContext(opt_level=3):
+      graph, lib, params = relay.build_module.build(mod, target, params=params)
+   print(lib.imported_modules[0].get_source())
+
+The kernels generated this way is actually working with 2d arrays, leveraging 
textures
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t 
pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image 
object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to 
utilize hardware in a more efficient way.
+
+Precisions
+~~~~~~~~~~
+The right choice of precision for a specific workload can greatly increase the 
efficiency of the solution,
+shifting the initial balance of precision and speed to the side that is a 
priority for the problem.
+
+We can choose from *float16*, *float16_acc32* (Mixed Precision), *float32* 
(standard).
+
+**Float16**
+
+To leverage the GPU hardware capabilities and utilize the benefits of half 
precision computation and memory management,
+we can convert an original model having floating points operation to a model 
operating with half precision.
+Choosing lower precision will positively affect the performance of the model, 
but it may also have a decrease in the accuracy of the model.
+To do the conversion you need to write a simple conversion function and 
specify the *dtype* value of "float16" before calling the function:
+
+.. code:: python
+
+   def  convert_to_dtype(mod, dtype):
+      # downcast to float16
+      if  dtype == "float16":
+         global  conv2d_acc = "float16"
+         from  tvm.ir  import  IRModule
+         mod = IRModule.from_expr(mod)
+         seq = tvm.transform.Sequential(
+            [
+                  relay.transform.InferType(),
+                  relay.transform.ToMixedPrecision()
+            ]
+         )
+         with  tvm.transform.PassContext(opt_level=3):
+            mod = seq(mod)
+      return  mod
+
+   dtype="float16"
+   mod = convert_to_dtype(mod["main"], dtype)
+
+We then can compile our model in any convinient way
+
+.. code:: python
+
+   with  tvm.transform.PassContext(opt_level=3):
+       lib = relay.build(
+           mod, target_host=target_host, target=target, params=params
+       )
+
+**float16_acc32 (Mixed Precision)**
+
+ToMixedPrecision pass traverse over the network and split network to clusters 
of ops dealing with float or float16 data types.
+The clusters are defined by three types of operations:
+- Operations always be converted into float16 data type
+- Operations which can be converted if they follow by converted cluster
+- Operations never be converted to the float16 data type
+This list is defined in the ToMixedPrecision implementation here
+`relay/transform/mixed_precision.py 
<https://github.com/apache/tvm/blob/main/python/tvm/relay/transform/mixed_precision.py#L34>`_
+and can be overridden by user
+
+In some cases, we want higher precision in accumulation than the input data.
+This is supported, for example, for conv2d and dense operations. To override 
accumulation type you need to register
+function with ``@register_mixed_precision_conversion`` decorator to modify 
parameters of ``ToMixedPrecision`` conversion
+
+.. code:: python
+
+   from  tvm.relay.op  import  register_mixed_precision_conversion
+
+   conv2d_acc = "float32"
+
+   # Pick a priority > 10 to overwrite defaults, higher priorities take 
precedence
+   @register_mixed_precision_conversion("nn.conv2d", level=11)
+   def  conv2d_mixed_precision_rule(call_node: "relay.Call", 
mixed_precision_type: str):
+       global  conv2d_acc
+       return [
+           # always do main calculation in mixed_precision_type
+           relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
+           # the dtype for the accumulator
+           conv2d_acc,
+           # the output dtype for the operation (usually fp16)
+           mixed_precision_type,
+       ]
+
+   # Same for dense
+   @register_mixed_precision_conversion("nn.dense", level=11)
+   def  conv2d_mixed_precision_rule(call_node: "relay.Call", 
mixed_precision_type: str):
+       global  conv2d_acc
+       return [
+           relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
+           conv2d_acc,
+           mixed_precision_type,
+       ]
+
+Now we need to modify the conversion function by adding some logical "forks" 
and ToMixedPrecision() call,
+then create a Relay graph from desired model in any convinient way and obtain 
**mod** (which is IR representation of the model),
+after which we can convert it to the required **dtype** and then assemble our 
model sequentialy
+
+.. code:: python
+
+   def  convert_to_dtype(mod, dtype):
+       # downcast to float16
+       if  dtype == "float16"  or  dtype == "float16_acc32":
+           global  conv2d_acc
+           conv2d_acc = "float16"  if  dtype == "float16"  else  "float32"
+           from  tvm.ir  import  IRModule
+           mod = IRModule.from_expr(mod)
+           seq = tvm.transform.Sequential(
+               [
+                   relay.transform.InferType(),
+                   relay.transform.ToMixedPrecision()
+               ]
+           )
+           with tvm.transform.PassContext(
+                config={"relay.ToMixedPrecision.keep_orig_output_dtype": True},
+                opt_level=3):
+            mod = seq(mod)
+       return  mod
+
+   dtype="float16_acc32"
+   mod = convert_to_dtype(mod["main"], dtype)
+   dtype = "float32"  if  dtype == "float32"  else  "float16"
+
+The ``ToMixedPrecision`` method is a pass to convert an FP32 relay graph into 
an FP16 version (with
+FP16 or FP32 accumulation dtypes). Doing this transformation is useful for 
reducing model size
+as it halves the expected size of the weights (FP16_acc16 case).
+
+From this point onwards, we can compile our model as normal
+
+.. code:: python
+
+   with  tvm.transform.PassContext(opt_level=3):
+       lib = relay.build(
+           mod, target_host=target_host, target=target, params=params
+       )
+
+.. |High-level overview of the Adreno A5x architecture for OpenCL| image:: 
https://raw.githubusercontent.com/tlc-pack/web-data/main/images/how-to/adreno_architecture.png
+.. |Android deployment pipeline| image:: 
https://raw.githubusercontent.com/tlc-pack/web-data/main/images/how-to/android_deployment_pipeline.jpg
diff --git a/docs/how_to/deploy/index.rst b/docs/how_to/deploy/index.rst
index 74bae0f923..ac1e2a1276 100644
--- a/docs/how_to/deploy/index.rst
+++ b/docs/how_to/deploy/index.rst
@@ -169,6 +169,7 @@ target device without relying on RPC. See the following 
resources on how to do s
 
    cpp_deploy
    android
+   adreno
    integrate
    hls
    arm_compute_lib
diff --git a/gallery/how_to/deploy_models/deploy_model_on_adreno.py 
b/gallery/how_to/deploy_models/deploy_model_on_adreno.py
new file mode 100644
index 0000000000..d6ed1f1f99
--- /dev/null
+++ b/gallery/how_to/deploy_models/deploy_model_on_adreno.py
@@ -0,0 +1,351 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+"""
+.. _tutorial-deploy-model-on-adreno:
+
+Deploy the Pretrained Model on Adreno
+=======================================
+**Author**: Daniil Barinov
+
+This article is a step-by-step tutorial to deploy pretrained Pytorch ResNet-18 
model on Adreno (on different precisions).
+
+For us to begin with, PyTorch must be installed.
+TorchVision is also required since we will be using it as our model zoo.
+
+A quick solution is to install it via pip:
+
+.. code-block:: bash
+
+  pip install torch
+  pip install torchvision
+
+Besides that, you should have TVM builded for Android.
+See the following instructions on how to build it.
+
+`Deploy to Adreno GPU <https://tvm.apache.org/docs/how_to/deploy/adreno.html>`_
+
+After the build section there should be two files in *build* directory 
«libtvm_runtime.so» and «tvm_rpc».
+Let's push them to the device and run TVM RPC Server.
+"""
+
+######################################################################
+# TVM RPC Server
+# --------------
+# To get the hash of the device use:
+#
+# .. code-block:: bash
+#
+#   adb devices
+#
+# Then to upload these two files to the device you should use:
+#
+# .. code-block:: bash
+#
+#   adb -s <device_hash> push {libtvm_runtime.so,tvm_rpc} /data/local/tmp
+#
+# At this moment you will have «libtvm_runtime.so» and «tvm_rpc» on path 
/data/local/tmp on your device.
+# Sometimes cmake can’t find «libc++_shared.so». Use:
+#
+# .. code-block:: bash
+#
+#   find ${ANDROID_NDK_HOME} -name libc++_shared.so
+#
+# to find it and also push it with adb on the desired device:
+#
+# .. code-block:: bash
+#
+#   adb -s <device_hash> push libc++_shared.so /data/local/tmp
+#
+# We are now ready to run the TVM RPC Server.
+# Launch rpc_tracker with following line in 1st console:
+#
+# .. code-block:: bash
+#
+#   python3 -m tvm.exec.rpc_tracker --port 9190
+#
+# Then we need to run tvm_rpc server from under the desired device in 2nd 
console:
+#
+# .. code-block:: bash
+#
+#   adb -s <device_hash> reverse tcp:9190 tcp:9190
+#   adb -s <device_hash> forward tcp:9090 tcp:9090
+#   adb -s <device_hash> forward tcp:9091 tcp:9091
+#   adb -s <device_hash> forward tcp:9092 tcp:9092
+#   adb -s <device_hash> forward tcp:9093 tcp:9093
+#   adb -s <device_hash> shell LD_LIBRARY_PATH=/data/local/tmp 
/data/local/tmp/tvm_rpc server --host=0.0.0.0 --port=9090 
--tracker=127.0.0.1:9190 --key=android --port-end=9190
+#
+# Before proceeding to compile and infer model, specify TVM_TRACKER_HOST and 
TVM_TRACKER_PORT
+#
+# .. code-block:: bash
+#
+#   export TVM_TRACKER_HOST=0.0.0.0
+#   export TVM_TRACKER_PORT=9190
+#
+# check that the tracker is running and the device is available
+#
+# .. code-block:: bash
+#
+#     python -m tvm.exec.query_rpc_tracker --port 9190
+#
+# For example, if we have 1 Android device,
+# the output can be:
+#
+# .. code-block:: bash
+#
+#    Queue Status
+#    ----------------------------------
+#    key          total  free  pending
+#    ----------------------------------
+#    android      1      1     0
+#    ----------------------------------
+
+#################################################################
+# Load a test image
+# -----------------
+# As an example we would use classical cat image from ImageNet
+
+# sphinx_gallery_start_ignore
+from tvm import testing
+
+testing.utils.install_request_hook(depth=3)
+# sphinx_gallery_end_ignore
+
+from PIL import Image
+from tvm.contrib.download import download_testdata
+from matplotlib import pyplot as plt
+import numpy as np
+
+img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true";
+img_path = download_testdata(img_url, "cat.png", module="data")
+img = Image.open(img_path).resize((224, 224))
+plt.imshow(img)
+plt.show()
+
+# Preprocess the image and convert to tensor
+from torchvision import transforms
+
+my_preprocess = transforms.Compose(
+    [
+        transforms.Resize(256),
+        transforms.CenterCrop(224),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 
0.225]),
+    ]
+)
+img = my_preprocess(img)
+img = np.expand_dims(img, 0)
+
+#################################################################
+# Load pretrained Pytorch model
+# -----------------------------
+# Create a Relay graph from a Pytorch ResNet-18 model
+import os
+import torch
+import torchvision
+import tvm
+from tvm import te
+from tvm import relay, rpc
+from tvm.contrib import utils, ndk
+from tvm.contrib import graph_executor
+
+model_name = "resnet18"
+model = getattr(torchvision.models, model_name)(pretrained=True)
+model = model.eval()
+
+# We grab the TorchScripted model via tracing
+input_shape = [1, 3, 224, 224]
+input_data = torch.randn(input_shape)
+scripted_model = torch.jit.trace(model, input_data).eval()
+
+# Input name can be arbitrary
+input_name = "input0"
+shape_list = [(input_name, img.shape)]
+mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)
+
+#################################################################
+# Precisions
+# ----------
+# Since TVM support Mixed Precision, we need to register 
mixed_precision_conversion:
+from tvm.relay.op import register_mixed_precision_conversion
+
+conv2d_acc = "float32"
+
+
+@register_mixed_precision_conversion("nn.conv2d", level=11)
+def conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: 
str):
+    global conv2d_acc
+    return [
+        relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
+        conv2d_acc,
+        mixed_precision_type,
+    ]
+
+
+@register_mixed_precision_conversion("nn.dense", level=11)
+def conv2d_mixed_precision_rule(call_node: "relay.Call", mixed_precision_type: 
str):
+    global conv2d_acc
+    return [
+        relay.transform.mixed_precision.MIXED_PRECISION_ALWAYS,
+        conv2d_acc,
+        mixed_precision_type,
+    ]
+
+
+#################################################################
+# and also define the conversion function itself
+def convert_to_dtype(mod, dtype):
+    # downcast to float16
+    if dtype == "float16" or dtype == "float16_acc32":
+        global conv2d_acc
+        conv2d_acc = "float16" if dtype == "float16" else "float32"
+        from tvm.ir import IRModule
+
+        mod = IRModule.from_expr(mod)
+        seq = tvm.transform.Sequential(
+            [relay.transform.InferType(), relay.transform.ToMixedPrecision()]
+        )
+        with tvm.transform.PassContext(opt_level=3):
+            mod = seq(mod)
+    return mod
+
+
+#################################################################
+# Let's choose "float16_acc32" for example.
+dtype = "float16_acc32"
+mod = convert_to_dtype(mod["main"], dtype)
+dtype = "float32" if dtype == "float32" else "float16"
+
+print(mod)
+
+#################################################################
+# As you can see in the IR, the architecture now contains cast operations, 
which are
+# needed to convert to FP16 precision.
+# You can also use "float16" or "float32" precisions as other dtype options.
+
+#################################################################
+# Compile the model with relay
+# ----------------------------
+# Specify Adreno target before compiling to generate texture
+# leveraging kernels and get all the benefits of textures
+# Note: This generated example running on our x86 server for demonstration.
+# If running it on the Android device, we need to
+# specify its instruction set. Set :code:`local_demo` to False if you want
+# to run this tutorial with a real device.
+
+local_demo = True
+
+# by default on CPU target will execute.
+# select 'cpu', 'opencl' and 'vulkan'
+test_target = "cpu"
+
+# Change target configuration.
+# Run `adb shell cat /proc/cpuinfo` to find the arch.
+arch = "arm64"
+target = tvm.target.Target("llvm -mtriple=%s-linux-android" % arch)
+
+if local_demo:
+    target = tvm.target.Target("llvm")
+elif test_target == "opencl":
+    target = tvm.target.Target("opencl", host=target)
+elif test_target == "vulkan":
+    target = tvm.target.Target("vulkan", host=target)
+
+with tvm.transform.PassContext(opt_level=3):
+    lib = relay.build(mod, target=target, params=params)
+
+#################################################################
+# Deploy the Model Remotely by RPC
+# --------------------------------
+# Using RPC you can deploy the model from host
+# machine to the remote Adreno device
+
+rpc_tracker_host = os.environ.get("TVM_TRACKER_HOST", "127.0.0.1")
+rpc_tracker_port = int(os.environ.get("TVM_TRACKER_PORT", 9190))
+key = "android"
+
+if local_demo:
+    remote = rpc.LocalSession()
+else:
+    tracker = rpc.connect_tracker(rpc_tracker_host, rpc_tracker_port)
+    # When running a heavy model, we should increase the `session_timeout`
+    remote = tracker.request(key, priority=0, session_timeout=60)
+
+if local_demo:
+    dev = remote.cpu(0)
+elif test_target == "opencl":
+    dev = remote.cl(0)
+elif test_target == "vulkan":
+    dev = remote.vulkan(0)
+else:
+    dev = remote.cpu(0)
+
+temp = utils.tempdir()
+dso_binary = "dev_lib_cl.so"
+dso_binary_path = temp.relpath(dso_binary)
+fcompile = ndk.create_shared if not local_demo else None
+lib.export_library(dso_binary_path, fcompile)
+remote_path = "/data/local/tmp/" + dso_binary
+remote.upload(dso_binary_path)
+rlib = remote.load_module(dso_binary)
+m = graph_executor.GraphModule(rlib["default"](dev))
+
+#################################################################
+# Run inference
+# -------------
+# We now can set inputs, infer our model and get predictions as output
+m.set_input(input_name, tvm.nd.array(img.astype("float32")))
+m.run()
+tvm_output = m.get_output(0)
+
+#################################################################
+# Get predictions and performance statistic
+# -----------------------------------------
+# This piece of code displays the top-1 and top-5 predictions, as
+# well as provides information about the model's performance
+from os.path import join, isfile
+from matplotlib import pyplot as plt
+from tvm.contrib import download
+
+# Download ImageNet categories
+categ_url = "https://github.com/uwsampl/web-data/raw/main/vta/models/";
+categ_fn = "synset.txt"
+download.download(join(categ_url, categ_fn), categ_fn)
+synset = eval(open(categ_fn).read())
+
+top_categories = np.argsort(tvm_output.asnumpy()[0])
+top5 = np.flip(top_categories, axis=0)[:5]
+
+# Report top-1 classification result
+print("Top-1 id: {}, class name: {}".format(top5[1 - 1], synset[top5[1 - 1]]))
+
+# Report top-5 classification results
+print("\nTop5 predictions: \n")
+print("\t#1:", synset[top5[1 - 1]])
+print("\t#2:", synset[top5[2 - 1]])
+print("\t#3:", synset[top5[3 - 1]])
+print("\t#4:", synset[top5[4 - 1]])
+print("\t#5:", synset[top5[5 - 1]])
+print("\t", top5)
+ImageNetClassifier = False
+for k in top_categories[-5:]:
+    if "cat" in synset[k]:
+        ImageNetClassifier = True
+assert ImageNetClassifier, "Failed ImageNet classifier validation check"
+
+print("Evaluate inference time cost...")
+print(m.benchmark(dev, number=1, repeat=10))

[tvm] branch main updated: [Adreno] Add documentation for Adreno deployment (#13393)

Reply via email to