comaniac commented on a change in pull request #6395: URL: https://github.com/apache/incubator-tvm/pull/6395#discussion_r500463790
########## File path: docs/deploy/tensorrt.rst ########## @@ -0,0 +1,288 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay TensorRT Integration +========================== +**Author**: `Trevor Morris <https://github.com/trevor-m>`_ + +Introduction +------------ + +NVIDIA TensorRT is a library for optimized deep learning inference. This integration will offload as +many operators as possible from Relay to TensorRT, providing a performance boost on NVIDIA GPUs +without the need to tune schedules. + +This guide will demonstrate how to install TensorRT and build TVM with TensorRT BYOC and runtime +enabled. It will also provide example code to compile and run a ResNet-18 model using TensorRT and +how to configure the compilation and runtime settings. Finally, we document the supported operators +and how to extend the integration to support other operators. + +Installing TensorRT +------------------- + +In order to download TensorRT, you will need to create an NVIDIA Developer program account. Please +see NVIDIA's documentation for more info: +https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you have a Jetson device +such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the device via the +JetPack SDK. + +There are two methods to install TensorRT: + +* System install via deb or rpm package. +* Tar file installation. + +With the tar file installation method, you must provide the path of the extracted tar archive to +USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method, +USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation. + +Building TVM with TensorRT support +---------------------------------- + +There are two separate build flags for TensorRT integration in TVM: + +* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, which does not require any +TensorRT library. +* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable the TensorRT runtime +module. This will build TVM against the TensorRT libraries. + +Example setting in config.cmake file: + +.. code:: cmake + + set(USE_TENSORRT ON) + set(USE_TENSORRT_GRAPH_RUNTIME /home/ubuntu/TensorRT-7.0.0.11) + + +Build and Deploy ResNet-18 with TensorRT +---------------------------------------- + +Create a Relay graph from a MXNet ResNet-18 model. + +.. code:: python + + import tvm + from tvm import relay + import mxnet + from mxnet.gluon.model_zoo.vision import get_model + + dtype = "float32" + input_shape = (1, 3, 224, 224) + block = get_model('resnet18_v1', pretrained=True) + mod, params = relay.frontend.from_mxnet(block, shape={'data': input_shape}, dtype=dtype) + + +Annotate and partition the graph for TensorRT. All ops which are supported by the TensorRT +integration will be marked and offloaded to TensorRT. The rest of the ops will go through the +regular TVM CUDA compilation and code generation. + +.. code:: python + + from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt + mod, config = partition_for_tensorrt(mod, params) + + +Build the Relay graph, using the new module and config returned by partition_for_tensorrt. The +target must always be a cuda target. + +.. code:: python + + target = "cuda" + with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': config}): + lib = relay.build(mod, target=target, params=params) + + +Export the module. + +.. code:: python + + lib.export_library('compiled.so') + + +Load module and run inference. The first run will take longer because the TensorRT engine will have +to be built. + +.. code:: python + + ctx = tvm.gpu(0) + loaded_lib = tvm.runtime.load_module('compiled.so') + gen_module = tvm.contrib.graph_runtime.GraphModule(loaded_lib['default'](ctx)) + input_data = np.random.uniform(0, 1, input_shape).astype(dtype) + gen_module.run(data=input_data) + + +Partitioning and Compilation Settings +---------------- Review comment: ```suggestion ------------------------------------- ``` ########## File path: docs/deploy/tensorrt.rst ########## @@ -0,0 +1,288 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay TensorRT Integration +========================== +**Author**: `Trevor Morris <https://github.com/trevor-m>`_ + +Introduction +------------ + +NVIDIA TensorRT is a library for optimized deep learning inference. This integration will offload as +many operators as possible from Relay to TensorRT, providing a performance boost on NVIDIA GPUs +without the need to tune schedules. + +This guide will demonstrate how to install TensorRT and build TVM with TensorRT BYOC and runtime +enabled. It will also provide example code to compile and run a ResNet-18 model using TensorRT and +how to configure the compilation and runtime settings. Finally, we document the supported operators +and how to extend the integration to support other operators. + +Installing TensorRT +------------------- + +In order to download TensorRT, you will need to create an NVIDIA Developer program account. Please +see NVIDIA's documentation for more info: +https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you have a Jetson device +such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the device via the +JetPack SDK. + +There are two methods to install TensorRT: + +* System install via deb or rpm package. +* Tar file installation. + +With the tar file installation method, you must provide the path of the extracted tar archive to +USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method, +USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation. + +Building TVM with TensorRT support +---------------------------------- + +There are two separate build flags for TensorRT integration in TVM: + +* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, which does not require any +TensorRT library. +* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable the TensorRT runtime +module. This will build TVM against the TensorRT libraries. + +Example setting in config.cmake file: + +.. code:: cmake + + set(USE_TENSORRT ON) + set(USE_TENSORRT_GRAPH_RUNTIME /home/ubuntu/TensorRT-7.0.0.11) + + +Build and Deploy ResNet-18 with TensorRT +---------------------------------------- + +Create a Relay graph from a MXNet ResNet-18 model. + +.. code:: python + + import tvm + from tvm import relay + import mxnet + from mxnet.gluon.model_zoo.vision import get_model + + dtype = "float32" + input_shape = (1, 3, 224, 224) + block = get_model('resnet18_v1', pretrained=True) + mod, params = relay.frontend.from_mxnet(block, shape={'data': input_shape}, dtype=dtype) + + +Annotate and partition the graph for TensorRT. All ops which are supported by the TensorRT +integration will be marked and offloaded to TensorRT. The rest of the ops will go through the +regular TVM CUDA compilation and code generation. + +.. code:: python + + from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt + mod, config = partition_for_tensorrt(mod, params) + + +Build the Relay graph, using the new module and config returned by partition_for_tensorrt. The +target must always be a cuda target. Review comment: Better to briefly say something about the `config`. Like what it is and how users should use it. Update: I found a later section talking about this. Then we can just provide a pointer (e.g., we will introduce the `config` in details in [section link]) ########## File path: docs/deploy/tensorrt.rst ########## @@ -0,0 +1,288 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay TensorRT Integration +========================== +**Author**: `Trevor Morris <https://github.com/trevor-m>`_ + +Introduction +------------ + +NVIDIA TensorRT is a library for optimized deep learning inference. This integration will offload as +many operators as possible from Relay to TensorRT, providing a performance boost on NVIDIA GPUs +without the need to tune schedules. + +This guide will demonstrate how to install TensorRT and build TVM with TensorRT BYOC and runtime +enabled. It will also provide example code to compile and run a ResNet-18 model using TensorRT and +how to configure the compilation and runtime settings. Finally, we document the supported operators +and how to extend the integration to support other operators. + +Installing TensorRT +------------------- + +In order to download TensorRT, you will need to create an NVIDIA Developer program account. Please +see NVIDIA's documentation for more info: +https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you have a Jetson device +such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the device via the +JetPack SDK. + +There are two methods to install TensorRT: + +* System install via deb or rpm package. +* Tar file installation. + +With the tar file installation method, you must provide the path of the extracted tar archive to +USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method, +USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation. + +Building TVM with TensorRT support +---------------------------------- + +There are two separate build flags for TensorRT integration in TVM: + +* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, which does not require any +TensorRT library. +* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable the TensorRT runtime +module. This will build TVM against the TensorRT libraries. + +Example setting in config.cmake file: + +.. code:: cmake + + set(USE_TENSORRT ON) + set(USE_TENSORRT_GRAPH_RUNTIME /home/ubuntu/TensorRT-7.0.0.11) + + +Build and Deploy ResNet-18 with TensorRT +---------------------------------------- + +Create a Relay graph from a MXNet ResNet-18 model. + +.. code:: python + + import tvm + from tvm import relay + import mxnet + from mxnet.gluon.model_zoo.vision import get_model + + dtype = "float32" + input_shape = (1, 3, 224, 224) + block = get_model('resnet18_v1', pretrained=True) + mod, params = relay.frontend.from_mxnet(block, shape={'data': input_shape}, dtype=dtype) + + +Annotate and partition the graph for TensorRT. All ops which are supported by the TensorRT +integration will be marked and offloaded to TensorRT. The rest of the ops will go through the +regular TVM CUDA compilation and code generation. + +.. code:: python + + from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt + mod, config = partition_for_tensorrt(mod, params) + + +Build the Relay graph, using the new module and config returned by partition_for_tensorrt. The +target must always be a cuda target. + +.. code:: python + + target = "cuda" + with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': config}): + lib = relay.build(mod, target=target, params=params) + + +Export the module. + +.. code:: python + + lib.export_library('compiled.so') + + +Load module and run inference. The first run will take longer because the TensorRT engine will have +to be built. Review comment: Better to emphasize "load module and run inference on the target machine, which must have USE_TENSORT_GRAPH_RUNTIME" enabled. ########## File path: docs/deploy/tensorrt.rst ########## @@ -0,0 +1,288 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + +.. http://www.apache.org/licenses/LICENSE-2.0 + +.. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Relay TensorRT Integration +========================== +**Author**: `Trevor Morris <https://github.com/trevor-m>`_ + +Introduction +------------ + +NVIDIA TensorRT is a library for optimized deep learning inference. This integration will offload as +many operators as possible from Relay to TensorRT, providing a performance boost on NVIDIA GPUs +without the need to tune schedules. + +This guide will demonstrate how to install TensorRT and build TVM with TensorRT BYOC and runtime +enabled. It will also provide example code to compile and run a ResNet-18 model using TensorRT and +how to configure the compilation and runtime settings. Finally, we document the supported operators +and how to extend the integration to support other operators. + +Installing TensorRT +------------------- + +In order to download TensorRT, you will need to create an NVIDIA Developer program account. Please +see NVIDIA's documentation for more info: +https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you have a Jetson device +such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the device via the +JetPack SDK. + +There are two methods to install TensorRT: + +* System install via deb or rpm package. +* Tar file installation. + +With the tar file installation method, you must provide the path of the extracted tar archive to +USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method, +USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation. + +Building TVM with TensorRT support +---------------------------------- + +There are two separate build flags for TensorRT integration in TVM: + +* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, which does not require any +TensorRT library. +* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable the TensorRT runtime +module. This will build TVM against the TensorRT libraries. Review comment: It might be better to emphasize the scenario. For example, we can say these two flags also enables cross-compilation: USE_TENSORT=ON lets you build a module with TensorRT support on a host machine; while USE_TENSORT_GRAPH_RUNTIME=ON enables the TVM runtime on an edge device to execute the TensorRT module. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org