echuraev commented on code in PR #13791: URL: https://github.com/apache/tvm/pull/13791#discussion_r1082147944
########## apps/cpp_rtvm/README.md: ########## @@ -0,0 +1,354 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + + +# Native Inference application for CPP Native + +Native inference tool ```rtvm``` helps in deploying TVM compiled models from a standalone cpp environment. +Overall process starts from getting a model from a framework all the way up to running on target device using `rtvm` tool. + +### Models + +Models can be downloaded from well known frameworks like Tensorflow, PyTorch, TFLite, Onnx ..etc. +scripts/download_models.py has a reference to prepare sample network ```resnet50``` from keras framework. + +```bash +python3 scripts/download_models.py +``` + +### Auto Tuning +Auto tuning process tunes various operatrors the given model for respective target. Auto tuning for remote devices use ```tvm_rpc``` and we need to setup the rpc environment before we invoke tuning. +Please refer below section [RPC setup](#rpc-setup) for the same. + +Auto tunng is necessary to obtain best performaning kernels. We can skip this step if we have tuning log already or the tuning cache is available from tophub (implicite by TVM compilation process). +Below message indicate that there exists some kernels not optimized for the selected target. In this case we can proceed with tuning to best performance. +```One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.``` + +with below environment from [RPC setup](#rpc-setup) +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +the model to be tuned being ```./model_data/keras-resnet50/resnet50.h5``` +``` + +the below command we can generate the tuning cache to file ```./model_data/keras-resnet50/keras-resnet50.log``` + +```bash +python3 -m tvm.driver.tvmc tune --target="opencl" --target-host="llvm -mtriple=aarch64-linux-gnu" \ +./model_data/keras-resnet50/resnet50.h5 -o ./model_data/keras-resnet50/keras-resnet50.log \ +--early-stopping 0 --repeat 30 --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --trials 1024 \ +--tuning-records ./model_data/keras-resnet50/keras-resnet50-records.log --tuner xgb +``` + +where +```bash +--target="opencl" refers to opencl device on Android device +--target-host="llvm -mtriple=aarch64-linux-gnu" refers to target_host being an ARM64 CPU +Options --early-stopping, --repeat, --trials, --tuner are Auto TVM specific options. +``` +Please refer to AutoTVM documentation for more details [here](https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html?highlight=autotvm). + +### Compile the model + +Compilation step generates TVM compiler output artifacts which need to be taken to target device for deployment. +These artifacts is a compressed archive with kernel shared lib, json with graph description and params binary. + +Below command will generate the same + + +```bash +python3 -m tvm.driver.tvmc compile --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \ +--target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu -o keras-resnet50.tar ./model_data/keras-resnet50/resnet50.h5 +``` + +where +``` +--cross-compiler : Indicates the cross compiler path for kernel library generation +--target="opencl, llvm" indicates target and host devices +``` + +### Test Run via RPC + +At this stage we can verify the generated compiler output for execution correctness over the RPC setup interface. +Below command can run the compiled output on remote target device. + +with + +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +compilation out being keras-resnet50.tar +``` + +```bash +python3 -m tvm.driver.tvmc run --device="opencl" keras-resnet50.tar --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --print-time +``` + +This inputs random inputs and validates the execution correctness of the compiled model. + +```tvmc``` tool has various options to input custom data, profile the model and benchmark the execution. + + +### Deployment Run + +Now we will verify the deployment run of the compiled model using ```rtvm``` tool on target device without any RPC or host based execution. + +We need to extract the tar achive on target device. We can copy the extracted contents of ```keras-resnet50.tar``` under Android temp folder at ```/data/local/tmp/keras-resnet50/``` + +Also copy the cross compiled tool ```rtvm``` and ```libtvm_runtime.so``` to ```data/local/tmp/``` + +```rtvm``` usage can be quired as below +```bash +Android:/data/local/tmp $ LD_LIBRARY_PATH=./ ./rtvm +Command line usage +--model - The folder containing tvm artifacts(mod.so, mod.param, mod.json) +--device - The target device to use {llvm, opencl, cpu, cuda, metal, rocm, vpi, oneapi} +--input - Numpy file for the model input (optional and we use random of not given) +--output - Numpy file name to dump the model output as numpy +--dump-meta - Dump model meta information + + Example + ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta + ./rtvm --model=keras-resnet50 --device="opencl" --input input.npz --output=output.npz +``` + +```rtvm``` can run the model using no inputs (just a dry run without any valid inputs) and also with specific input supplied as a numpy npz format file. + +We can create npz dump for all inputs by saving the dict object as hown below. + +With ```keras-resnet50``` having one input ```input_1``` with shape ```[1, 224, 224, 3]``` and dtype ```float32``` + +``` +# Random initilization +input1 = np.random.uniform(low=-1, high=1, size=(1, 224, 224, 3)).astype("float32") +dataset = {"input_1": input1} +np.savez("input.npz", **dataset) +``` + +Copy ```input.npz``` also to the target device as ```/data/local/tmp/input.npz``` + + +Now, on Android shell we can do a dry run as well as with specific input as shown below. +```bash +# Query meta data information +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" --dump-meta +. . . . . . +Meta Information:keras-resnet50 + Number of Inputs:183 + Number of Outputs:1 + Input MetaInfo: + Input:input_1 + DType:float32 + Shape:[1, 224, 224, 3] + Output MetaInfo: + Output:tvmgen_default_fused_nn_softmax + DType:float32 + Shape:[1, 1000] +. . . . . . + +# Dry run with out any inputs +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" +Model = keras-resnet50 +Device = opencl +Input = +Output = +Dump Metadata = False +TVMRunner Constructor:keras-resnet50 Devices:opencl +TVMRunner Load:keras-resnet50 +TVMRunner::GetMetaInfo +Executing dry run ... +Set Random Input for :input_1 +TVMRunner::GetInputMemSize:input_1 +Random Input Size:602112 bytes +TVMRunner::SetInput (Raw) +TVMRunner::Run +Get Output for :tvmgen_default_fused_nn_softmax +TVMRunner::GetOutputMemSize:tvmgen_default_fused_nn_softmax +TVMRunner::GetOutput (Raw) +Output Size:4000 bytes + + +# Run with input and dump output as npz file +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" Review Comment: There is no difference with the previous command. Should we pass parameters `--input` and `--output`? ########## apps/cpp_rtvm/README.md: ########## @@ -0,0 +1,354 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + + +# Native Inference application for CPP Native + +Native inference tool ```rtvm``` helps in deploying TVM compiled models from a standalone cpp environment. +Overall process starts from getting a model from a framework all the way up to running on target device using `rtvm` tool. + +### Models + +Models can be downloaded from well known frameworks like Tensorflow, PyTorch, TFLite, Onnx ..etc. +scripts/download_models.py has a reference to prepare sample network ```resnet50``` from keras framework. + +```bash +python3 scripts/download_models.py +``` + +### Auto Tuning +Auto tuning process tunes various operatrors the given model for respective target. Auto tuning for remote devices use ```tvm_rpc``` and we need to setup the rpc environment before we invoke tuning. +Please refer below section [RPC setup](#rpc-setup) for the same. + +Auto tunng is necessary to obtain best performaning kernels. We can skip this step if we have tuning log already or the tuning cache is available from tophub (implicite by TVM compilation process). +Below message indicate that there exists some kernels not optimized for the selected target. In this case we can proceed with tuning to best performance. +```One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.``` + +with below environment from [RPC setup](#rpc-setup) +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +the model to be tuned being ```./model_data/keras-resnet50/resnet50.h5``` +``` + +the below command we can generate the tuning cache to file ```./model_data/keras-resnet50/keras-resnet50.log``` + +```bash +python3 -m tvm.driver.tvmc tune --target="opencl" --target-host="llvm -mtriple=aarch64-linux-gnu" \ +./model_data/keras-resnet50/resnet50.h5 -o ./model_data/keras-resnet50/keras-resnet50.log \ +--early-stopping 0 --repeat 30 --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --trials 1024 \ +--tuning-records ./model_data/keras-resnet50/keras-resnet50-records.log --tuner xgb +``` + +where +```bash +--target="opencl" refers to opencl device on Android device +--target-host="llvm -mtriple=aarch64-linux-gnu" refers to target_host being an ARM64 CPU +Options --early-stopping, --repeat, --trials, --tuner are Auto TVM specific options. +``` +Please refer to AutoTVM documentation for more details [here](https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html?highlight=autotvm). + +### Compile the model + +Compilation step generates TVM compiler output artifacts which need to be taken to target device for deployment. +These artifacts is a compressed archive with kernel shared lib, json with graph description and params binary. + +Below command will generate the same + + +```bash +python3 -m tvm.driver.tvmc compile --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \ +--target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu -o keras-resnet50.tar ./model_data/keras-resnet50/resnet50.h5 +``` + +where +``` +--cross-compiler : Indicates the cross compiler path for kernel library generation +--target="opencl, llvm" indicates target and host devices +``` + +### Test Run via RPC + +At this stage we can verify the generated compiler output for execution correctness over the RPC setup interface. +Below command can run the compiled output on remote target device. + +with + +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +compilation out being keras-resnet50.tar +``` + +```bash +python3 -m tvm.driver.tvmc run --device="opencl" keras-resnet50.tar --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --print-time +``` + +This inputs random inputs and validates the execution correctness of the compiled model. + +```tvmc``` tool has various options to input custom data, profile the model and benchmark the execution. + + +### Deployment Run + +Now we will verify the deployment run of the compiled model using ```rtvm``` tool on target device without any RPC or host based execution. + +We need to extract the tar achive on target device. We can copy the extracted contents of ```keras-resnet50.tar``` under Android temp folder at ```/data/local/tmp/keras-resnet50/``` + +Also copy the cross compiled tool ```rtvm``` and ```libtvm_runtime.so``` to ```data/local/tmp/``` + +```rtvm``` usage can be quired as below +```bash +Android:/data/local/tmp $ LD_LIBRARY_PATH=./ ./rtvm +Command line usage +--model - The folder containing tvm artifacts(mod.so, mod.param, mod.json) +--device - The target device to use {llvm, opencl, cpu, cuda, metal, rocm, vpi, oneapi} +--input - Numpy file for the model input (optional and we use random of not given) +--output - Numpy file name to dump the model output as numpy +--dump-meta - Dump model meta information + + Example + ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta + ./rtvm --model=keras-resnet50 --device="opencl" --input input.npz --output=output.npz +``` + +```rtvm``` can run the model using no inputs (just a dry run without any valid inputs) and also with specific input supplied as a numpy npz format file. + +We can create npz dump for all inputs by saving the dict object as hown below. Review Comment: ```suggestion We can create npz dump for all inputs by saving the dict object as shown below. ``` ########## apps/cpp_rtvm/README.md: ########## @@ -0,0 +1,354 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + + +# Native Inference application for CPP Native + +Native inference tool ```rtvm``` helps in deploying TVM compiled models from a standalone cpp environment. +Overall process starts from getting a model from a framework all the way up to running on target device using `rtvm` tool. + +### Models + +Models can be downloaded from well known frameworks like Tensorflow, PyTorch, TFLite, Onnx ..etc. +scripts/download_models.py has a reference to prepare sample network ```resnet50``` from keras framework. + +```bash +python3 scripts/download_models.py +``` + +### Auto Tuning +Auto tuning process tunes various operatrors the given model for respective target. Auto tuning for remote devices use ```tvm_rpc``` and we need to setup the rpc environment before we invoke tuning. +Please refer below section [RPC setup](#rpc-setup) for the same. + +Auto tunng is necessary to obtain best performaning kernels. We can skip this step if we have tuning log already or the tuning cache is available from tophub (implicite by TVM compilation process). +Below message indicate that there exists some kernels not optimized for the selected target. In this case we can proceed with tuning to best performance. +```One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.``` + +with below environment from [RPC setup](#rpc-setup) +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +the model to be tuned being ```./model_data/keras-resnet50/resnet50.h5``` +``` + +the below command we can generate the tuning cache to file ```./model_data/keras-resnet50/keras-resnet50.log``` + +```bash +python3 -m tvm.driver.tvmc tune --target="opencl" --target-host="llvm -mtriple=aarch64-linux-gnu" \ +./model_data/keras-resnet50/resnet50.h5 -o ./model_data/keras-resnet50/keras-resnet50.log \ +--early-stopping 0 --repeat 30 --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --trials 1024 \ +--tuning-records ./model_data/keras-resnet50/keras-resnet50-records.log --tuner xgb +``` + +where +```bash +--target="opencl" refers to opencl device on Android device +--target-host="llvm -mtriple=aarch64-linux-gnu" refers to target_host being an ARM64 CPU +Options --early-stopping, --repeat, --trials, --tuner are Auto TVM specific options. +``` +Please refer to AutoTVM documentation for more details [here](https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html?highlight=autotvm). + +### Compile the model + +Compilation step generates TVM compiler output artifacts which need to be taken to target device for deployment. +These artifacts is a compressed archive with kernel shared lib, json with graph description and params binary. + +Below command will generate the same + + +```bash +python3 -m tvm.driver.tvmc compile --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \ +--target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu -o keras-resnet50.tar ./model_data/keras-resnet50/resnet50.h5 +``` + +where +``` +--cross-compiler : Indicates the cross compiler path for kernel library generation +--target="opencl, llvm" indicates target and host devices +``` + +### Test Run via RPC + +At this stage we can verify the generated compiler output for execution correctness over the RPC setup interface. +Below command can run the compiled output on remote target device. + +with + +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +compilation out being keras-resnet50.tar +``` + +```bash +python3 -m tvm.driver.tvmc run --device="opencl" keras-resnet50.tar --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --print-time +``` + +This inputs random inputs and validates the execution correctness of the compiled model. + +```tvmc``` tool has various options to input custom data, profile the model and benchmark the execution. + + +### Deployment Run + +Now we will verify the deployment run of the compiled model using ```rtvm``` tool on target device without any RPC or host based execution. + +We need to extract the tar achive on target device. We can copy the extracted contents of ```keras-resnet50.tar``` under Android temp folder at ```/data/local/tmp/keras-resnet50/``` + +Also copy the cross compiled tool ```rtvm``` and ```libtvm_runtime.so``` to ```data/local/tmp/``` + +```rtvm``` usage can be quired as below +```bash +Android:/data/local/tmp $ LD_LIBRARY_PATH=./ ./rtvm +Command line usage +--model - The folder containing tvm artifacts(mod.so, mod.param, mod.json) +--device - The target device to use {llvm, opencl, cpu, cuda, metal, rocm, vpi, oneapi} +--input - Numpy file for the model input (optional and we use random of not given) +--output - Numpy file name to dump the model output as numpy +--dump-meta - Dump model meta information + + Example + ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta + ./rtvm --model=keras-resnet50 --device="opencl" --input input.npz --output=output.npz +``` + +```rtvm``` can run the model using no inputs (just a dry run without any valid inputs) and also with specific input supplied as a numpy npz format file. + +We can create npz dump for all inputs by saving the dict object as hown below. + +With ```keras-resnet50``` having one input ```input_1``` with shape ```[1, 224, 224, 3]``` and dtype ```float32``` + +``` +# Random initilization +input1 = np.random.uniform(low=-1, high=1, size=(1, 224, 224, 3)).astype("float32") +dataset = {"input_1": input1} +np.savez("input.npz", **dataset) +``` + +Copy ```input.npz``` also to the target device as ```/data/local/tmp/input.npz``` + + +Now, on Android shell we can do a dry run as well as with specific input as shown below. +```bash +# Query meta data information +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" --dump-meta +. . . . . . +Meta Information:keras-resnet50 + Number of Inputs:183 + Number of Outputs:1 + Input MetaInfo: + Input:input_1 + DType:float32 + Shape:[1, 224, 224, 3] + Output MetaInfo: + Output:tvmgen_default_fused_nn_softmax + DType:float32 + Shape:[1, 1000] +. . . . . . + +# Dry run with out any inputs +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" +Model = keras-resnet50 +Device = opencl +Input = +Output = +Dump Metadata = False +TVMRunner Constructor:keras-resnet50 Devices:opencl +TVMRunner Load:keras-resnet50 +TVMRunner::GetMetaInfo +Executing dry run ... +Set Random Input for :input_1 +TVMRunner::GetInputMemSize:input_1 +Random Input Size:602112 bytes +TVMRunner::SetInput (Raw) +TVMRunner::Run +Get Output for :tvmgen_default_fused_nn_softmax +TVMRunner::GetOutputMemSize:tvmgen_default_fused_nn_softmax +TVMRunner::GetOutput (Raw) +Output Size:4000 bytes + + +# Run with input and dump output as npz file +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" +Model = keras-resnet50 +Device = opencl +Input = input.npz +Output = output.npz +Dump Metadata = False +TVMRunner Constructor:keras-resnet50 Devices:opencl +TVMRunner Load:keras-resnet50 +TVMRunner::GetMetaInfo +Executing with Input:input.npz Output:output.npz +TVMRunner::SetInput (Numpy):input.npz +Set Numpy Input for :input_1 +TVMRunner::Run +TVMRunner::GetOutput (Numpy):output.npz +Get Output for :tvmgen_default_fused_nn_softmax +Output Size:4000 bytes +``` + +output.npz contains the modle outputs. Below is a quick look of its contents. +```bash +tvm-host:~$ unzip -l output.npz +Archive: output.npz + Length Date Time Name +--------- ---------- ----- ---- + 4080 1980-00-00 00:00 tvmgen_default_fused_nn_softmax.npy +--------- ------- + 4080 1 file + +``` + +Building ```cpp_rtvm``` produces ```libtvm_runner.so```, a simplified interface that rtvm use internally for loading and executing tvm compiled models from C/C++ environments. +```tvm_runner.h``` describes the interface definition here. Alternatively pro users can use TVM's [c_native_api](https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h) interface for more access to TVM features. + + +# RPC Setup + +For Android devices we require cross compilation of tvm_rpc (also libtvm_runtime.so which is a dependency) for remote device. +RPC setup involves running tracker on host device and running tvm_rpc on target device. + +### Tracker + +Below command runs the tracker on host over port ```9100``` + +```bash +python3 -m tvm.exec.rpc_tracker --host 127.0.0.1 --port 9100" +``` +### RPC on Target + +With ```abcd1234ef``` being adb device id and tvm_rpc (and libtvm_runtime.so) is pushed to target device at ```/data/local/tmp/tvm_rpc/``` + +```bash +export ANDROID_SERIAL=abcd1234ef +# Below settings will reroute networking tcm connections on devices to host device via adb interface +adb reverse tcp:9100 tcp:9100 +adb forward tcp:5000 tcp:5000 +# Run the tvm_rpc on device +env adb shell "cd /data/local/tmp/tvm_rpc; killall -9 tvm_rpc; \ +LD_LIBRARY_PATH=/data/local/tmp/tvm_rpc/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9100 --key=android +``` + +Now we have the rpc setup with ```TVM_TRACKER_HOST=127.0.0.1```, ```TVM_TRACKER_PORT=9100``` and ```TVM_RPC_KEY=android```. + +We can also check connected and available devices on tracker as shown below. + +```bash +python3 -m tvm.exec.query_rpc_tracker --port ${TVM_TRACKER_PORT} +Tracker address 127.0.0.1:9100 + +Server List +------------------------------ +server-address key +------------------------------ + 127.0.0.1:5000 server:android +------------------------------ + +Queue Status +------------------------------- +key total free pending +------------------------------- +android 1 1 0 +------------------------------- +``` + + +# Target Specific Configuration + +Below sections describe device/target specific settings to be used with ```tvmc``` and ```rtvm``` tools. Review Comment: The instruction in this section is only related to the `tvmc`, isn't it? ########## apps/cpp_rtvm/README.md: ########## @@ -0,0 +1,354 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + + +# Native Inference application for CPP Native + +Native inference tool ```rtvm``` helps in deploying TVM compiled models from a standalone cpp environment. +Overall process starts from getting a model from a framework all the way up to running on target device using `rtvm` tool. + +### Models + +Models can be downloaded from well known frameworks like Tensorflow, PyTorch, TFLite, Onnx ..etc. +scripts/download_models.py has a reference to prepare sample network ```resnet50``` from keras framework. + +```bash +python3 scripts/download_models.py +``` + +### Auto Tuning +Auto tuning process tunes various operatrors the given model for respective target. Auto tuning for remote devices use ```tvm_rpc``` and we need to setup the rpc environment before we invoke tuning. +Please refer below section [RPC setup](#rpc-setup) for the same. + +Auto tunng is necessary to obtain best performaning kernels. We can skip this step if we have tuning log already or the tuning cache is available from tophub (implicite by TVM compilation process). +Below message indicate that there exists some kernels not optimized for the selected target. In this case we can proceed with tuning to best performance. +```One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.``` + +with below environment from [RPC setup](#rpc-setup) +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +the model to be tuned being ```./model_data/keras-resnet50/resnet50.h5``` +``` + +the below command we can generate the tuning cache to file ```./model_data/keras-resnet50/keras-resnet50.log``` + +```bash +python3 -m tvm.driver.tvmc tune --target="opencl" --target-host="llvm -mtriple=aarch64-linux-gnu" \ +./model_data/keras-resnet50/resnet50.h5 -o ./model_data/keras-resnet50/keras-resnet50.log \ +--early-stopping 0 --repeat 30 --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --trials 1024 \ +--tuning-records ./model_data/keras-resnet50/keras-resnet50-records.log --tuner xgb +``` + +where +```bash +--target="opencl" refers to opencl device on Android device +--target-host="llvm -mtriple=aarch64-linux-gnu" refers to target_host being an ARM64 CPU +Options --early-stopping, --repeat, --trials, --tuner are Auto TVM specific options. +``` +Please refer to AutoTVM documentation for more details [here](https://tvm.apache.org/docs/how_to/tune_with_autotvm/index.html?highlight=autotvm). + +### Compile the model + +Compilation step generates TVM compiler output artifacts which need to be taken to target device for deployment. +These artifacts is a compressed archive with kernel shared lib, json with graph description and params binary. + +Below command will generate the same + + +```bash +python3 -m tvm.driver.tvmc compile --cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang \ +--target="opencl, llvm" --target-llvm-mtriple aarch64-linux-gnu -o keras-resnet50.tar ./model_data/keras-resnet50/resnet50.h5 +``` + +where +``` +--cross-compiler : Indicates the cross compiler path for kernel library generation +--target="opencl, llvm" indicates target and host devices +``` + +### Test Run via RPC + +At this stage we can verify the generated compiler output for execution correctness over the RPC setup interface. +Below command can run the compiled output on remote target device. + +with + +``` bash +tvm tracker running on ```TVM_TRACKER_HOST``` +tracker port being ```TVM_TRACKER_PORT``` +rpc device access key being ```TVM_RPC_KEY``` +compilation out being keras-resnet50.tar +``` + +```bash +python3 -m tvm.driver.tvmc run --device="opencl" keras-resnet50.tar --rpc-key ${TVM_RPC_KEY} --rpc-tracker ${TVM_TRACKER_HOST}:${TVM_TRACKER_PORT} --print-time +``` + +This inputs random inputs and validates the execution correctness of the compiled model. + +```tvmc``` tool has various options to input custom data, profile the model and benchmark the execution. + + +### Deployment Run + +Now we will verify the deployment run of the compiled model using ```rtvm``` tool on target device without any RPC or host based execution. + +We need to extract the tar achive on target device. We can copy the extracted contents of ```keras-resnet50.tar``` under Android temp folder at ```/data/local/tmp/keras-resnet50/``` + +Also copy the cross compiled tool ```rtvm``` and ```libtvm_runtime.so``` to ```data/local/tmp/``` + +```rtvm``` usage can be quired as below +```bash +Android:/data/local/tmp $ LD_LIBRARY_PATH=./ ./rtvm +Command line usage +--model - The folder containing tvm artifacts(mod.so, mod.param, mod.json) +--device - The target device to use {llvm, opencl, cpu, cuda, metal, rocm, vpi, oneapi} +--input - Numpy file for the model input (optional and we use random of not given) +--output - Numpy file name to dump the model output as numpy +--dump-meta - Dump model meta information + + Example + ./rtvm --model=keras-resnet50 --device="opencl" --dump-meta + ./rtvm --model=keras-resnet50 --device="opencl" --input input.npz --output=output.npz +``` + +```rtvm``` can run the model using no inputs (just a dry run without any valid inputs) and also with specific input supplied as a numpy npz format file. + +We can create npz dump for all inputs by saving the dict object as hown below. + +With ```keras-resnet50``` having one input ```input_1``` with shape ```[1, 224, 224, 3]``` and dtype ```float32``` + +``` +# Random initilization +input1 = np.random.uniform(low=-1, high=1, size=(1, 224, 224, 3)).astype("float32") +dataset = {"input_1": input1} +np.savez("input.npz", **dataset) +``` + +Copy ```input.npz``` also to the target device as ```/data/local/tmp/input.npz``` + + +Now, on Android shell we can do a dry run as well as with specific input as shown below. +```bash +# Query meta data information +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" --dump-meta +. . . . . . +Meta Information:keras-resnet50 + Number of Inputs:183 + Number of Outputs:1 + Input MetaInfo: + Input:input_1 + DType:float32 + Shape:[1, 224, 224, 3] + Output MetaInfo: + Output:tvmgen_default_fused_nn_softmax + DType:float32 + Shape:[1, 1000] +. . . . . . + +# Dry run with out any inputs +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" +Model = keras-resnet50 +Device = opencl +Input = +Output = +Dump Metadata = False +TVMRunner Constructor:keras-resnet50 Devices:opencl +TVMRunner Load:keras-resnet50 +TVMRunner::GetMetaInfo +Executing dry run ... +Set Random Input for :input_1 +TVMRunner::GetInputMemSize:input_1 +Random Input Size:602112 bytes +TVMRunner::SetInput (Raw) +TVMRunner::Run +Get Output for :tvmgen_default_fused_nn_softmax +TVMRunner::GetOutputMemSize:tvmgen_default_fused_nn_softmax +TVMRunner::GetOutput (Raw) +Output Size:4000 bytes + + +# Run with input and dump output as npz file +Android:/data/local/tmp/ $ LD_LIBRARY_PATH=./ ./rtvm --model keras-resnet50 --device "opencl" +Model = keras-resnet50 +Device = opencl +Input = input.npz +Output = output.npz +Dump Metadata = False +TVMRunner Constructor:keras-resnet50 Devices:opencl +TVMRunner Load:keras-resnet50 +TVMRunner::GetMetaInfo +Executing with Input:input.npz Output:output.npz +TVMRunner::SetInput (Numpy):input.npz +Set Numpy Input for :input_1 +TVMRunner::Run +TVMRunner::GetOutput (Numpy):output.npz +Get Output for :tvmgen_default_fused_nn_softmax +Output Size:4000 bytes +``` + +output.npz contains the modle outputs. Below is a quick look of its contents. +```bash +tvm-host:~$ unzip -l output.npz +Archive: output.npz + Length Date Time Name +--------- ---------- ----- ---- + 4080 1980-00-00 00:00 tvmgen_default_fused_nn_softmax.npy +--------- ------- + 4080 1 file + +``` + +Building ```cpp_rtvm``` produces ```libtvm_runner.so```, a simplified interface that rtvm use internally for loading and executing tvm compiled models from C/C++ environments. +```tvm_runner.h``` describes the interface definition here. Alternatively pro users can use TVM's [c_native_api](https://github.com/apache/tvm/blob/main/include/tvm/runtime/c_runtime_api.h) interface for more access to TVM features. + + +# RPC Setup + +For Android devices we require cross compilation of tvm_rpc (also libtvm_runtime.so which is a dependency) for remote device. +RPC setup involves running tracker on host device and running tvm_rpc on target device. + +### Tracker + +Below command runs the tracker on host over port ```9100``` + +```bash +python3 -m tvm.exec.rpc_tracker --host 127.0.0.1 --port 9100" +``` +### RPC on Target + +With ```abcd1234ef``` being adb device id and tvm_rpc (and libtvm_runtime.so) is pushed to target device at ```/data/local/tmp/tvm_rpc/``` + +```bash +export ANDROID_SERIAL=abcd1234ef +# Below settings will reroute networking tcm connections on devices to host device via adb interface +adb reverse tcp:9100 tcp:9100 +adb forward tcp:5000 tcp:5000 +# Run the tvm_rpc on device +env adb shell "cd /data/local/tmp/tvm_rpc; killall -9 tvm_rpc; \ +LD_LIBRARY_PATH=/data/local/tmp/tvm_rpc/ ./tvm_rpc server --host=0.0.0.0 --port=5000 --port-end=5010 --tracker=127.0.0.1:9100 --key=android +``` + +Now we have the rpc setup with ```TVM_TRACKER_HOST=127.0.0.1```, ```TVM_TRACKER_PORT=9100``` and ```TVM_RPC_KEY=android```. + +We can also check connected and available devices on tracker as shown below. + +```bash +python3 -m tvm.exec.query_rpc_tracker --port ${TVM_TRACKER_PORT} +Tracker address 127.0.0.1:9100 + +Server List +------------------------------ +server-address key +------------------------------ + 127.0.0.1:5000 server:android +------------------------------ + +Queue Status +------------------------------- +key total free pending +------------------------------- +android 1 1 0 +------------------------------- +``` + + +# Target Specific Configuration + +Below sections describe device/target specific settings to be used with ```tvmc``` and ```rtvm``` tools. + +### Adreno GPU + +Adreno GPU has a docker definition that helps to ease the development environment. + +We can build the docker image by using below command from TVM repo. + +```bash +./docker/build.sh ci_adreno +docker tag tvm.ci_adreno ci_adreno +``` + +Below command builds host and target rpc components for Adreno and drops into an interactive shell. + +```bash +./tests/scripts/ci.py adreno -i +``` + +Also, one can build with Adreno OpenCLML SDK support + +```bash +export ADRENO_OPENCL=<Path to OpenCLML SDK> +./tests/scripts/ci.py adreno -i +``` + +Above command produces +```build-adreno``` which is host build +```build-adreno-target``` which contains cross compiled tvm_rpc and libtvm_runtime.so + + +Below options to be used for Adreno GPU while working with tvmc + +* Tuning + +``` +--target="opencl -device=adreno" +--target-host="llvm -mtriple=aarch64-linux-gnu" +``` + +* Compilation + +``` +--cross-compiler ${ANDROID_NDK_HOME}/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android28-clang +--target="opencl, llvm" +--target-opencl-device adreno +--target-llvm-mtriple aarch64-linux-gnu +``` + +While enabling CLML just need to specify below target option for compilation. Review Comment: Please, add indent in 2 spaces for all pieces of code and information which should be under the bullet. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
