[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6395: [BYOC][TensorRT] TensorRT BYOC integration

GitBox Tue, 06 Oct 2020 10:21:13 -0700


comaniac commented on a change in pull request #6395:
URL: https://github.com/apache/incubator-tvm/pull/6395#discussion_r500463790




##########
File path: docs/deploy/tensorrt.rst
##########
@@ -0,0 +1,288 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Relay TensorRT Integration
+==========================
+**Author**: `Trevor Morris <https://github.com/trevor-m>`_
+
+Introduction
+------------
+
+NVIDIA TensorRT is a library for optimized deep learning inference. This 
integration will offload as
+many operators as possible from Relay to TensorRT, providing a performance 
boost on NVIDIA GPUs
+without the need to tune schedules.
+
+This guide will demonstrate how to install TensorRT and build TVM with 
TensorRT BYOC and runtime
+enabled. It will also provide example code to compile and run a ResNet-18 
model using TensorRT and
+how to configure the compilation and runtime settings. Finally, we document 
the supported operators
+and how to extend the integration to support other operators.
+
+Installing TensorRT
+-------------------
+
+In order to download TensorRT, you will need to create an NVIDIA Developer 
program account. Please
+see NVIDIA's documentation for more info:
+https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you 
have a Jetson device
+such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the 
device via the
+JetPack SDK.
+
+There are two methods to install TensorRT:
+
+* System install via deb or rpm package.
+* Tar file installation.
+
+With the tar file installation method, you must provide the path of the 
extracted tar archive to
+USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method,
+USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation.
+
+Building TVM with TensorRT support
+----------------------------------
+
+There are two separate build flags for TensorRT integration in TVM:
+
+* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, 
which does not require any
+TensorRT library.
+* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable 
the TensorRT runtime
+module. This will build TVM against the TensorRT libraries.
+
+Example setting in config.cmake file:
+
+.. code:: cmake
+
+    set(USE_TENSORRT ON)
+    set(USE_TENSORRT_GRAPH_RUNTIME /home/ubuntu/TensorRT-7.0.0.11)
+
+
+Build and Deploy ResNet-18 with TensorRT
+----------------------------------------
+
+Create a Relay graph from a MXNet ResNet-18 model.
+
+.. code:: python
+
+    import tvm
+    from tvm import relay
+    import mxnet
+    from mxnet.gluon.model_zoo.vision import get_model
+
+    dtype = "float32"
+    input_shape = (1, 3, 224, 224)
+    block = get_model('resnet18_v1', pretrained=True)
+    mod, params = relay.frontend.from_mxnet(block, shape={'data': 
input_shape}, dtype=dtype)
+
+
+Annotate and partition the graph for TensorRT. All ops which are supported by 
the TensorRT
+integration will be marked and offloaded to TensorRT. The rest of the ops will 
go through the
+regular TVM CUDA compilation and code generation.
+
+.. code:: python
+
+    from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt
+    mod, config = partition_for_tensorrt(mod, params)
+
+
+Build the Relay graph, using the new module and config returned by 
partition_for_tensorrt. The
+target must always be a cuda target.
+
+.. code:: python
+
+    target = "cuda"
+    with tvm.transform.PassContext(opt_level=3, 
config={'relay.ext.tensorrt.options': config}):
+        lib = relay.build(mod, target=target, params=params)
+
+
+Export the module.
+
+.. code:: python
+
+    lib.export_library('compiled.so')
+
+
+Load module and run inference. The first run will take longer because the 
TensorRT engine will have
+to be built.
+
+.. code:: python
+
+    ctx = tvm.gpu(0)
+    loaded_lib = tvm.runtime.load_module('compiled.so')
+    gen_module = 
tvm.contrib.graph_runtime.GraphModule(loaded_lib['default'](ctx))
+    input_data = np.random.uniform(0, 1, input_shape).astype(dtype)
+    gen_module.run(data=input_data)
+
+
+Partitioning and Compilation Settings
+----------------

Review comment:
       ```suggestion
   -------------------------------------
   ```

##########
File path: docs/deploy/tensorrt.rst
##########
@@ -0,0 +1,288 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Relay TensorRT Integration
+==========================
+**Author**: `Trevor Morris <https://github.com/trevor-m>`_
+
+Introduction
+------------
+
+NVIDIA TensorRT is a library for optimized deep learning inference. This 
integration will offload as
+many operators as possible from Relay to TensorRT, providing a performance 
boost on NVIDIA GPUs
+without the need to tune schedules.
+
+This guide will demonstrate how to install TensorRT and build TVM with 
TensorRT BYOC and runtime
+enabled. It will also provide example code to compile and run a ResNet-18 
model using TensorRT and
+how to configure the compilation and runtime settings. Finally, we document 
the supported operators
+and how to extend the integration to support other operators.
+
+Installing TensorRT
+-------------------
+
+In order to download TensorRT, you will need to create an NVIDIA Developer 
program account. Please
+see NVIDIA's documentation for more info:
+https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you 
have a Jetson device
+such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the 
device via the
+JetPack SDK.
+
+There are two methods to install TensorRT:
+
+* System install via deb or rpm package.
+* Tar file installation.
+
+With the tar file installation method, you must provide the path of the 
extracted tar archive to
+USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method,
+USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation.
+
+Building TVM with TensorRT support
+----------------------------------
+
+There are two separate build flags for TensorRT integration in TVM:
+
+* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, 
which does not require any
+TensorRT library.
+* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable 
the TensorRT runtime
+module. This will build TVM against the TensorRT libraries.
+
+Example setting in config.cmake file:
+
+.. code:: cmake
+
+    set(USE_TENSORRT ON)
+    set(USE_TENSORRT_GRAPH_RUNTIME /home/ubuntu/TensorRT-7.0.0.11)
+
+
+Build and Deploy ResNet-18 with TensorRT
+----------------------------------------
+
+Create a Relay graph from a MXNet ResNet-18 model.
+
+.. code:: python
+
+    import tvm
+    from tvm import relay
+    import mxnet
+    from mxnet.gluon.model_zoo.vision import get_model
+
+    dtype = "float32"
+    input_shape = (1, 3, 224, 224)
+    block = get_model('resnet18_v1', pretrained=True)
+    mod, params = relay.frontend.from_mxnet(block, shape={'data': 
input_shape}, dtype=dtype)
+
+
+Annotate and partition the graph for TensorRT. All ops which are supported by 
the TensorRT
+integration will be marked and offloaded to TensorRT. The rest of the ops will 
go through the
+regular TVM CUDA compilation and code generation.
+
+.. code:: python
+
+    from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt
+    mod, config = partition_for_tensorrt(mod, params)
+
+
+Build the Relay graph, using the new module and config returned by 
partition_for_tensorrt. The
+target must always be a cuda target.

Review comment:
       Better to briefly say something about the `config`. Like what it is and 
how users should use it.
   
   Update: I found a later section talking about this. Then we can just provide 
a pointer (e.g., we will introduce the `config` in details in [section link])

##########
File path: docs/deploy/tensorrt.rst
##########
@@ -0,0 +1,288 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Relay TensorRT Integration
+==========================
+**Author**: `Trevor Morris <https://github.com/trevor-m>`_
+
+Introduction
+------------
+
+NVIDIA TensorRT is a library for optimized deep learning inference. This 
integration will offload as
+many operators as possible from Relay to TensorRT, providing a performance 
boost on NVIDIA GPUs
+without the need to tune schedules.
+
+This guide will demonstrate how to install TensorRT and build TVM with 
TensorRT BYOC and runtime
+enabled. It will also provide example code to compile and run a ResNet-18 
model using TensorRT and
+how to configure the compilation and runtime settings. Finally, we document 
the supported operators
+and how to extend the integration to support other operators.
+
+Installing TensorRT
+-------------------
+
+In order to download TensorRT, you will need to create an NVIDIA Developer 
program account. Please
+see NVIDIA's documentation for more info:
+https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you 
have a Jetson device
+such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the 
device via the
+JetPack SDK.
+
+There are two methods to install TensorRT:
+
+* System install via deb or rpm package.
+* Tar file installation.
+
+With the tar file installation method, you must provide the path of the 
extracted tar archive to
+USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method,
+USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation.
+
+Building TVM with TensorRT support
+----------------------------------
+
+There are two separate build flags for TensorRT integration in TVM:
+
+* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, 
which does not require any
+TensorRT library.
+* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable 
the TensorRT runtime
+module. This will build TVM against the TensorRT libraries.
+
+Example setting in config.cmake file:
+
+.. code:: cmake
+
+    set(USE_TENSORRT ON)
+    set(USE_TENSORRT_GRAPH_RUNTIME /home/ubuntu/TensorRT-7.0.0.11)
+
+
+Build and Deploy ResNet-18 with TensorRT
+----------------------------------------
+
+Create a Relay graph from a MXNet ResNet-18 model.
+
+.. code:: python
+
+    import tvm
+    from tvm import relay
+    import mxnet
+    from mxnet.gluon.model_zoo.vision import get_model
+
+    dtype = "float32"
+    input_shape = (1, 3, 224, 224)
+    block = get_model('resnet18_v1', pretrained=True)
+    mod, params = relay.frontend.from_mxnet(block, shape={'data': 
input_shape}, dtype=dtype)
+
+
+Annotate and partition the graph for TensorRT. All ops which are supported by 
the TensorRT
+integration will be marked and offloaded to TensorRT. The rest of the ops will 
go through the
+regular TVM CUDA compilation and code generation.
+
+.. code:: python
+
+    from tvm.relay.op.contrib.tensorrt import partition_for_tensorrt
+    mod, config = partition_for_tensorrt(mod, params)
+
+
+Build the Relay graph, using the new module and config returned by 
partition_for_tensorrt. The
+target must always be a cuda target.
+
+.. code:: python
+
+    target = "cuda"
+    with tvm.transform.PassContext(opt_level=3, 
config={'relay.ext.tensorrt.options': config}):
+        lib = relay.build(mod, target=target, params=params)
+
+
+Export the module.
+
+.. code:: python
+
+    lib.export_library('compiled.so')
+
+
+Load module and run inference. The first run will take longer because the 
TensorRT engine will have
+to be built.

Review comment:
       Better to emphasize "load module and run inference on the target 
machine, which must have USE_TENSORT_GRAPH_RUNTIME" enabled.

##########
File path: docs/deploy/tensorrt.rst
##########
@@ -0,0 +1,288 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+Relay TensorRT Integration
+==========================
+**Author**: `Trevor Morris <https://github.com/trevor-m>`_
+
+Introduction
+------------
+
+NVIDIA TensorRT is a library for optimized deep learning inference. This 
integration will offload as
+many operators as possible from Relay to TensorRT, providing a performance 
boost on NVIDIA GPUs
+without the need to tune schedules.
+
+This guide will demonstrate how to install TensorRT and build TVM with 
TensorRT BYOC and runtime
+enabled. It will also provide example code to compile and run a ResNet-18 
model using TensorRT and
+how to configure the compilation and runtime settings. Finally, we document 
the supported operators
+and how to extend the integration to support other operators.
+
+Installing TensorRT
+-------------------
+
+In order to download TensorRT, you will need to create an NVIDIA Developer 
program account. Please
+see NVIDIA's documentation for more info:
+https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html. If you 
have a Jetson device
+such as a TX1, TX2, Xavier, or Nano, TensorRT will already be installed on the 
device via the
+JetPack SDK.
+
+There are two methods to install TensorRT:
+
+* System install via deb or rpm package.
+* Tar file installation.
+
+With the tar file installation method, you must provide the path of the 
extracted tar archive to
+USE_TENSORT_GRAPH_RUNTIME=/path/to/TensorRT. With the system install method,
+USE_TENSORT_GRAPH_RUNTIME=ON will automatically locate your installation.
+
+Building TVM with TensorRT support
+----------------------------------
+
+There are two separate build flags for TensorRT integration in TVM:
+
+* USE_TENSORT=ON/OFF - This flag will enable compiling a TensorRT module, 
which does not require any
+TensorRT library.
+* USE_TENSORT_GRAPH_RUNTIME=ON/OFF/path-to-TensorRT - This flag will enable 
the TensorRT runtime
+module. This will build TVM against the TensorRT libraries.

Review comment:
       It might be better to emphasize the scenario. For example, we can say 
these two flags also enables cross-compilation: USE_TENSORT=ON lets you build a 
module with TensorRT support on a host machine; while 
USE_TENSORT_GRAPH_RUNTIME=ON enables the TVM runtime on an edge device to 
execute the TensorRT module.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-tvm] comaniac commented on a change in pull request #6395: [BYOC][TensorRT] TensorRT BYOC integration

Reply via email to