srkreddy1238 commented on code in PR #13867:
URL: https://github.com/apache/tvm/pull/13867#discussion_r1104034871


##########
docs/how_to/deploy/adreno.rst:
##########
@@ -65,134 +78,483 @@ Reasons of using textures:
 Overall, with textures, it is possible to achieve a significant performance 
boost
 compared to OpenCL buffer based solutions.
 
-.. _building_tvm_for_adreno:
+In general we specify target as ``target="opencl"`` for a regular OpenCL based 
target which generates the kernels as shown below.
 
-Building TVM for Adreno
------------------------
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__global float* 
restrict p0, __global double* restrict p1, __global float* restrict 
conv2d_nhwc) {
+   // body..
+
+Above OpenCL kernel definition has ``__global float*`` poniters which are 
essestially OpenCL ``buffer``  objects.
+
+When enabled texture based enhancements by modifying target definition as 
``target="opencl -device=adreno"`` we can see the generated
+kernels using texture backed OpenCL image objects as shown below.
+
+.. code:: c
+
+   __kernel void tvmgen_default_fused_nn_conv2d_kernel0(__write_only image2d_t 
pad_temp_global_texture, __read_only image2d_t p0) {
+   // body..
+
+*image2d_t* is a built-in OpenCL types that represents two-dimensional image 
object and provides several additional functions.
+When we use *image2d_t* we read *4 elements at one time*, and it helps to 
utilize hardware in a more efficient way.
+
+Please refer to :ref:`Advanced Usage<advanced_usage>` for more details about 
generation and inspection of kernel sources.
+
+
+.. _about_openclml:
 
-This section gives instructions on how to build the Android part of TVM
-with OpenCL and TVM RPC Server in order to deploy models on Adreno.
+About OpenCLML
+--------------
 
-Since the process of building TVM for Adreno is exactly the same as the
-process of building TVM for Android, please refer to these instructions:
-`TVM RPC
-Server <https://github.com/apache/tvm/tree/main/apps/cpp_rpc>`_.
+OpenCLML is a SDK released by Qualcomm that provides accelerated deep learning 
operators.
+These operators are exposed as an extension "cl_qcom_ml_ops" to standard 
OpenCL specification.
+Please refer `Accelerate your models with our OpenCL ML SDK 
<https://developer.qualcomm.com/blog/accelerate-your-models-our-opencl-ml-sdk>`_
 for more details.
 
-Since there are many required packages for Android, you can use the official 
Docker Image to build TVM.
-For more information refer to this guide: `Deploy the Pretrained Model on 
Android 
<https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_android.html>`_.
+OpenCLML is integrated into TVM as a `BYOC 
<https://tvm.apache.org/docs/dev/how_to/relay_bring_your_own_codegen.html?highlight=bring%20your%20own>`_
 solution.
+OpenCLML operators can use same context and can be enqueued on same command 
queue as used in native OpenCL.
+We took advantage of this to avoid any context switching over heads while 
fallback to native OpenCL.
+
+
+.. _build_deploy:
+
+TVM for Adreno™
+---------------
+
+This section gives instructions about various ways of building and deploying 
model
+to Adreno™ target. Adreno™ is a remote target which is connected to the host 
via ADB connection.
+Deploying the compiled model here require use some tools on host as well as on 
target.
+
+TVM has simplified user friendly command line based tools as well as
+developer centric python API interface for various steps like auto tuning, 
building and deploying.
+
+TVM compilation process for remote devices has multiple stages listed below.
+
+**Model import:**
+At this stage we import a model from well known frameworks like Tensorflow, 
PyTorch, ONNX ...etc.
+This stage converts the given model into TVM's relay module format. 
Alternatively one can build a relay module manually
+by using TVM's operator inventory too. TVM module generated here is a target 
independent representation of the graph.
+
+**Auto Tuning:**
+At this stage we tune the TVM generated kernels specific to a target. Auto 
tuning process requires
+target device availability and in case of a remote target like Adreno™ on 
Android device we use RPC Setup for communication.
+Later sections in this guide will detail about RPC Setup for Android device. 
Auto tuning is not a necessary step for
+compilation of a model. It is necessary for acheiving best performance out of 
TVM generated kernels.
+
+**Compilation:**
+At this stage we compile the model for specific target. Given we auto tuned 
the module in previous stage,
+TVM compilation make use of the tuning log for genetrating best performing 
kernels. TVM compilation process produces artifacts
+containing kernel shared lib, graph definition in json format and parameters 
binary file in TVM specific format.
+
+**Deploy (or test run) on Target:**
+At this stage we run the TVM compilation output on the target. Deployment is 
possible from python
+environment using RPC Setup and also using TVM's native tool which is native 
binary cross compiled for Android.
+At this stage we can run the compiled model on Android target and unit test 
output correctness and performance aspects.
+
+**Aplication Integration:**
+This stage is all about integrating TVM compiled model in applications. Here 
we discuss about
+interfacing tvm runtime from Android (cpp native environment or from JNI) for 
setting input and getting output.
+
+**Advanced Usage:**
+This section advanced user interests like viewing generated source code, 
altering precision of the module ...etc.
+
+
+This tutorial covers all the above aspects as part of below sections.
+
+- :ref:`Development environment<development_environment>`
+- :ref:`RPC Setup<rpc_setup>`
+- :ref:`Commandline tools<commandline_interface>`
+- :ref:`Python interface<python_interface>`
+- :ref:`Application Integration<application_integration>`
+- :ref:`Advanced Usage<advanced_usage>`
+
+.. _development_environment:
+
+
+Development Environment Setup : Automatic
+-----------------------------------------
+TVM ships a predefined docker container environment with all prerequisites to 
get started quickly.
+You may also refer to :ref:`Manual Environment Setup<manual_setup>` for more 
control on the dependencies.
+
+For docker setup the pre requisite is just docker tool availabilty on host.
+
+Below commands can build a docker image for adreno.
+
+::
 
-**Prerequisites**: Android NDK and Android Debug Bridge must
-be installed, the desired device must have OpenCL support and Android part of 
TVM must be built:
+   ./docker/build.sh ci_adreno
+   docker tag tvm.ci_adreno ci_adreno
+
+
+Now we can build both host and target utils with below command.
+
+::
+
+   ./tests/scripts/ci.py adreno -i
+
+To build TVM with OpenCLML SDK we need export the OpenCLML SDK as shown below 
while building
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   ./tests/scripts/ci.py adreno -i
+
+On successful compilation this leaves us into a docker shell. The build leaves 
two folders
+
+* build-adreno:  The host side TVM compiler build.
+* build-adreno-target : Contains the android target components
+
+    * libtvm_runtime.so : TVM runtime library
+    * tvm_rpc : The rpc runtime environment tool
+    * rtvm : A native stand alone tool
+
+While using docker environment the android device is shared with host. Hence, 
it is required
+to have adb version "1.0.41" on the host as the docker used the same version.
+
+We can check adb devices availability inside docker environment too.
+
+::
+
+   user@ci-adreno-fpeqs:~$ adb devices
+   List of devices attached
+   aaaabbbb    device
+   ccccdddd    device
+
+.. _manual_setup:
+
+Development Environment Setup : Manual
+--------------------------------------
+
+Manual build process require building of host and target components.
+
+Below command will configure the build the host compiler
+
+::
+
+   mkdir -p build
+   cd build
+   cp ../cmake/config.cmake .
+
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_LLVM ON\) >> config.cmake
+
+Additionally we can push below config entry to compile with OpenCLML support.
+
+::
+
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML ${ADRENO_OPENCL}\) >> config.cmake
+
+now we can build as shown below
+
+::
+
+   cmake ..
+   make
+
+Finally we can export python path as
+
+::
+
+   export PYTHONPATH=$PWD:/python
+   python3 -c "import tvm" # Verify tvm python package
+
+
+Now, we can configure and build the target components with below configuration
+Target build require Android NDK to be installed.
 
 - Read documentation about *Android NDK installation* here: 
https://developer.android.com/ndk
 - To get access to adb tools you can see *Android Debug Bridge installation* 
here: https://developer.android.com/studio/command-line/adb
 
-You can also build the android part of TVM locally. From the root
-folder of TVM:
 
 ::
 
-   mkdir build_android
-   cd build_android
-   cmake .. -DUSE_OPENCL=ON 
-DCMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake 
-DANDROID_ABI=arm64-v8a -DANDROID_NATIVE_API_LEVEL=android-28 
-DCMAKE_FIND_ROOT_PATH_MODE_PACKAGE=ON -DANDROID_STL=c++_static -DUSE_CPP_RPC=ON
-   make -jN tvm_runtime tvm_rpc
+   mkdir -p build-adreno
+   cd build-adreno
+   cp ../cmake/config.cmake .
+   echo set\(USE_MICRO OFF\) >> config.cmake
+   echo set\(USE_OPENCL ON\) >> config.cmake
+   echo set\(USE_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RPC ON\) >> config.cmake
+   echo set\(USE_CPP_RTVM ON\) >> config.cmake
+   echo set\(USE_GRAPH_EXECUTOR ON\) >> config.cmake
+   echo set\(USE_LIBBACKTRACE AUTO\) >> config.cmake
+   echo set\(USE_KALLOC_ALIGNMENT 32\) >> config.cmake
 
-where **N** is the number of cores available on your *CPU*.
+   echo set\(ANDROID_ABI arm64-v8a\) >> config.cmake
+   echo set\(ANDROID_PLATFORM android-28\) >> config.cmake
+   echo set\(MACHINE_NAME aarch64-linux-gnu\) >> config.cmake
 
-At this stage you have built TVM for Adreno.
+Additionally we can push below config to compile with OpenCLML support.
 
-.. _build_and_deploy_model_for_adreno:
+::
 
-Build and deploy model for Adreno
----------------------------------
+   export ADRENO_OPENCL=<Path to OpenCLML SDK>
+   echo set\(USE_CLML "${ADRENO_OPENCL}"\) >> config.cmake
+   echo set\(USE_CLML_GRAPH_EXECUTOR "${ADRENO_OPENCL}"\) >> config.cmake
 
-In this section we will focus on target, needed to compile and deploy models 
for Adreno, demonstrate
-the differences in generated kernels with and without textures and, in 
addition, the
-possibility of choosing a different precision for model compilation will
-be considered.
+For Android target build ANDROID_NDK_HOME is a dependency and we should have 
the same in the enviromnet variable.
+Below commands will build Adreno™ target components
 
-For the complete step-py-step process of compiling and deploying models on
-Adreno, including selection of precision, running the inference of the
-model, getting the predictions, and measuring the performance please refer to 
this tutorial: `How To Deploy model on Adreno 
<https://tvm.apache.org/docs/how_to/deploy_models/deploy_model_on_adreno.html>`_
+::
 
-|Android deployment pipeline|
+   cmake 
-DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK_HOME}/build/cmake/android.toolchain.cmake"
 \
+      -DANDROID_ABI=arm64-v8a \
+      -DANDROID_PLATFORM=android-28 \
+      -DCMAKE_SYSTEM_VERSION=1 \
+      -DCMAKE_FIND_ROOT_PATH="${ADRENO_OPENCL}" \
+      -DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
+      -DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
+      
-DCMAKE_CXX_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang++"
 \
+      
-DCMAKE_C_COMPILER="${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang"
 \
+      -DMACHINE_NAME="aarch64-linux-gnu" ..
 
-*Fig.2 Deployment pipeline on Adreno devices*
+   make tvm_runtime tvm_rpc rtvm
 
-The figure above demonstrates a generalized pipeline for deploying and running 
neural network models on android devices.
-As can be seen from the figure, the compiled model has a set_input() and a 
run() methods,
-which *prepare the inputs* for inference and *execute the inference* on the 
remote device using the Graph Executor runtime module.
 
-Adreno target
-~~~~~~~~~~~~~
+.. _rpc_setup:
 
-Normally, when compiling models for Android using OpenCL, the
-corresponding target is used
+RPC Setup

Review Comment:
   Makes sense. I will improve here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to