jcf94 commented on a change in pull request #8808:
URL: https://github.com/apache/tvm/pull/8808#discussion_r696218247



##########
File path: src/runtime/contrib/tensorrt/tensorrt_runtime.cc
##########
@@ -125,17 +147,26 @@ class TensorRTRuntime : public JSONRuntimeBase {
 
   /*! \brief Run inference using built engine. */
   void Run() override {
+    

Review comment:
       > We have to build the engine first to know the input binding order that 
tensorrt assigns to the inputs. It might not match TVM input signature directly.
   > This is the same strategy used by TF-TRT
   
   Yeah, I mean we can also just put all of the input information check here, 
use something like `unordered_map<string, void*>`, `unordered_map<string, 
size>` to create the calibrator and set_batch here. During real inference, 
these information will be useless.
   
   The trt `get_batch` function has a parameter `const char* names[]`, that can 
be used when getting data out. So the order does not matter in this way.
   
   At least in TF-2.4.0, they used:
   ```cpp
     // Construct a calibrator for future calibration.
     TRTInt8Calibrator(
         const std::unordered_map<string, std::pair<void*, size_t>>& 
dev_buffers,
         int batch_size, string engine_name);
   
     // Feed calibration data to the calibrator, and return true if the data is
     // accepted. Return false if the calibrator has been terminated.
     bool setBatch(const std::unordered_map<string, void*>& data,
                   const cudaStream_t stream);
   ```
   I think that would be a good example for reference.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to