tiandiao123 commented on a change in pull request #8808:
URL: https://github.com/apache/tvm/pull/8808#discussion_r698740623
##########
File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc
##########
@@ -66,7 +78,16 @@ class TensorRTRuntime : public JSONRuntimeBase {
use_implicit_batch_(true),
max_workspace_size_(size_t(1) << 30),
max_batch_size_(-1),
- multi_engine_mode_(false) {}
+ multi_engine_mode_(false) {
+ const bool use_int8 = dmlc::GetEnv("TVM_TENSORRT_USE_INT8", false);
Review comment:
> I like the idea of having both options, with environment variable
being able to override whatever is set during compilation.
> However, I suggest we make this improvement in a separate PR, and just
keep the environment variable method only in this PR.
ok!
##########
File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc
##########
@@ -40,30 +40,30 @@ namespace contrib {
TensorRTBuilder::TensorRTBuilder(TensorRTLogger* logger,
const std::vector<const DLTensor*>&
data_entry,
size_t max_workspace_size, bool
use_implicit_batch, bool use_fp16,
- int batch_size)
+ int batch_size, nvinfer1::IInt8Calibrator*
calibrator)
: data_entry_(data_entry),
max_workspace_size_(max_workspace_size),
use_implicit_batch_(use_implicit_batch),
use_fp16_(use_fp16),
batch_size_(batch_size) {
// Create TRT builder and network.
builder_ = nvinfer1::createInferBuilder(*logger);
-#if TRT_VERSION_GE(6, 0, 1)
- // Use INetworkV2.
- auto flags =
- 1U <<
static_cast<uint32_t>(nvinfer1::NetworkDefinitionCreationFlag::kEXPLICIT_BATCH);
- if (use_implicit_batch_) {
- flags = 0U;
- builder_->setMaxBatchSize(batch_size_);
- }
- network_ = builder_->createNetworkV2(flags);
-#else
+ LOG(INFO) << "create a builder_ ";
+ use_int8_ = false;
Review comment:
fixed it
##########
File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc
##########
@@ -40,30 +40,30 @@ namespace contrib {
TensorRTBuilder::TensorRTBuilder(TensorRTLogger* logger,
const std::vector<const DLTensor*>&
data_entry,
size_t max_workspace_size, bool
use_implicit_batch, bool use_fp16,
- int batch_size)
+ int batch_size, nvinfer1::IInt8Calibrator*
calibrator)
: data_entry_(data_entry),
max_workspace_size_(max_workspace_size),
use_implicit_batch_(use_implicit_batch),
use_fp16_(use_fp16),
batch_size_(batch_size) {
// Create TRT builder and network.
builder_ = nvinfer1::createInferBuilder(*logger);
-#if TRT_VERSION_GE(6, 0, 1)
Review comment:
> Why does these version macros need to be removed?
>
> In my experience in higher version of TRT, use api like
`builder_->setInt8Mode` will throw out some warning(Even in TRT8, some api has
been deprecated ... Better to test these in a higher version.)
I have re-added them into the code base
##########
File path: src/runtime/contrib/tensorrt/tensorrt_builder.cc
##########
@@ -156,8 +161,18 @@ TensorRTEngineAndContext TensorRTBuilder::BuildEngine() {
config_ = builder_->createBuilderConfig();
config_->setMaxWorkspaceSize(max_workspace_size_);
if (use_fp16_) {
+ config_->setFlag(nvinfer1::BuilderFlag::kGPU_FALLBACK);
Review comment:
> I think this is not needed
fixed it already
##########
File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc
##########
@@ -66,7 +78,16 @@ class TensorRTRuntime : public JSONRuntimeBase {
use_implicit_batch_(true),
max_workspace_size_(size_t(1) << 30),
max_batch_size_(-1),
- multi_engine_mode_(false) {}
+ multi_engine_mode_(false) {
+ const bool use_int8 = dmlc::GetEnv("TVM_TENSORRT_USE_INT8", false);
Review comment:
Hi @FrozenGene @trevor-m @jcf94 I have modified original design, please
review it. As for transfer parameters into partition_for_tensorrt api, like
trevor suggested, probably I can provide a new and separate pr after this pr
can be merged? Currently, I keep the design of setting up the environment
variables
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]