[GitHub] [tvm] tiandiao123 commented on a change in pull request #8808: [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration

GitBox Tue, 31 Aug 2021 07:57:51 -0700


tiandiao123 commented on a change in pull request #8808:
URL: https://github.com/apache/tvm/pull/8808#discussion_r698740623




##########
File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc
##########
@@ -66,7 +78,16 @@ class TensorRTRuntime : public JSONRuntimeBase {
         use_implicit_batch_(true),
         max_workspace_size_(size_t(1) << 30),
         max_batch_size_(-1),
-        multi_engine_mode_(false) {}
+        multi_engine_mode_(false) {
+          const bool use_int8 = dmlc::GetEnv("TVM_TENSORRT_USE_INT8", false);

Review comment:
       > I like the idea of having both options, with environment variable 
being able to override whatever is set during compilation.
   > However, I suggest we make this improvement in a separate PR, and just 
keep the environment variable method only in this PR.
   
   ok! 

##########
File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc
##########
@@ -40,30 +40,30 @@ namespace contrib {
 TensorRTBuilder::TensorRTBuilder(TensorRTLogger* logger,
                                  const std::vector<const DLTensor*>& 
data_entry,
                                  size_t max_workspace_size, bool 
use_implicit_batch, bool use_fp16,
-                                 int batch_size)
+                                 int batch_size, nvinfer1::IInt8Calibrator* 
calibrator)
     : data_entry_(data_entry),
       max_workspace_size_(max_workspace_size),
       use_implicit_batch_(use_implicit_batch),
       use_fp16_(use_fp16),
       batch_size_(batch_size) {
   // Create TRT builder and network.
   builder_ = nvinfer1::createInferBuilder(*logger);
-#if TRT_VERSION_GE(6, 0, 1)
-  // Use INetworkV2.
-  auto flags =
-      1U << 
static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
-  if (use_implicit_batch_) {
-    flags = 0U;
-    builder_->setMaxBatchSize(batch_size_);
-  }
-  network_ = builder_->createNetworkV2(flags);
-#else
+  LOG(INFO) << "create a builder_ ";
+  use_int8_ = false;

Review comment:
       fixed it

##########
File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc
##########
@@ -40,30 +40,30 @@ namespace contrib {
 TensorRTBuilder::TensorRTBuilder(TensorRTLogger* logger,
                                  const std::vector<const DLTensor*>& 
data_entry,
                                  size_t max_workspace_size, bool 
use_implicit_batch, bool use_fp16,
-                                 int batch_size)
+                                 int batch_size, nvinfer1::IInt8Calibrator* 
calibrator)
     : data_entry_(data_entry),
       max_workspace_size_(max_workspace_size),
       use_implicit_batch_(use_implicit_batch),
       use_fp16_(use_fp16),
       batch_size_(batch_size) {
   // Create TRT builder and network.
   builder_ = nvinfer1::createInferBuilder(*logger);
-#if TRT_VERSION_GE(6, 0, 1)

Review comment:
       > Why does these version macros need to be removed?
   > 
   > In my experience in higher version of TRT, use api like 
`builder_->setInt8Mode` will throw out some warning(Even in TRT8, some api has 
been deprecated ... Better to test these in a higher version.)
   
   I have re-added them into the code base

##########
File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc
##########
@@ -156,8 +161,18 @@ TensorRTEngineAndContext TensorRTBuilder::BuildEngine() {
   config_ = builder_->createBuilderConfig();
   config_->setMaxWorkspaceSize(max_workspace_size_);
   if (use_fp16_) {
+    config_->setFlag(nvinfer1::BuilderFlag::kGPU_FALLBACK);

Review comment:
       > I think this is not needed
   
   fixed it already

##########
File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc
##########
@@ -66,7 +78,16 @@ class TensorRTRuntime : public JSONRuntimeBase {
         use_implicit_batch_(true),
         max_workspace_size_(size_t(1) << 30),
         max_batch_size_(-1),
-        multi_engine_mode_(false) {}
+        multi_engine_mode_(false) {
+          const bool use_int8 = dmlc::GetEnv("TVM_TENSORRT_USE_INT8", false);

Review comment:
       Hi @FrozenGene @trevor-m @jcf94 I have modified original design, please 
review it. As for transfer parameters into partition_for_tensorrt api, like 
trevor suggested, probably I can provide a new and separate pr after this pr 
can be merged? Currently, I keep the design of setting up the environment 
variables




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tiandiao123 commented on a change in pull request #8808: [BYOC][TensorRT] Add TensorRT own int8 calibration support to TensorRT BYOC integration

Reply via email to