[GitHub] [incubator-tvm] trevor-m commented on pull request #6395: [BYOC][TensorRT] TensorRT BYOC integration

GitBox Mon, 14 Sep 2020 13:33:43 -0700


trevor-m commented on pull request #6395:
URL: https://github.com/apache/incubator-tvm/pull/6395#issuecomment-692298168



   > For the rest 2 points.
   > 
   >     2. Is that possible to move the pass before partitioning but after 
merge compiler region (like `PruneTesnorRTCompilerRegion`)? After the merge 
compiler region pass you should get the Relay graph with almost the same 
semantic as partitioning. If you could have a pass checking each compiler 
region for your constraints, you can probably just remove the region you don't 
want, so that you should get only valid partitioned functions.
   > 
   >     3. Can the TensorRT version be obtained via an API call in C++? 
Something like `tensorrt::get_version()`? If so you can register a global 
symbol and pass the version to Python so that it can be used by the annotator.
   > 
   > 
   > ```python
   > def conv2d(...):
   >     if not tvm.get_global_func("relay.tensorrt.version", True):
   >         return False
   >     ver = tvm.get_global_func("relay.tensorrt.version")
   >     if ver == '1.0':
   >         return True
   >     return False
   > ```
   > 
   > If you need manually set up the TensorRT version, then it could be like 
this: Let user specify it in `config.cmake` and we pass the value to a macro in 
C++ so that you could simply return the value. The drawback of this solution is 
that it needs to rebuild TVM to annotate different TensorRT versions, and I'm 
not sure if that makes sense to you.
   
   Thanks @comaniac!
   
   > 2. Is that possible to move the pass before partitioning but after merge 
compiler region (like `PruneTesnorRTCompilerRegion`)? After the merge compiler 
region pass you should get the Relay graph with almost the same semantic as 
partitioning. If you could have a pass checking each compiler region for your 
constraints, you can probably just remove the region you don't want, so that 
you should get only valid partitioned functions.
   
   Hmm, this seems like it would make the job of the `PruneTensorRTSubgraph` 
pass much more difficult. `PartitionGraph` already takes care of collecting the 
inputs and outputs of a subgraph and additional processing such as making sure 
there are no duplicate outputs. If `PruneTesnorRTCompilerRegion` was before 
`PartitionGraph`, it would have to duplicate a lot of that work. The idea of 
the pruning pass is that we should present each backend with the final subgraph 
exactly as it would be when it is passed to the codegen and the backend should 
decide if it is valid or not. Are you concerned about the overhead of 
partitioning a subgraph which would be later discarded?
   
   Btw just for referece, here is the general implementation of PruneSubgraph 
that I originally implemented: 
https://github.com/trevor-m/tvm/commit/06015a4617cfaad56adcaa0c71b485d6bd711128
   
   > 3. Can the TensorRT version be obtained via an API call in C++? Something 
like `tensorrt::get_version()`? If so you can register a global symbol and pass 
the version to Python so that it can be used by the annotator.  If you need 
manually set up the TensorRT version, then it could be like this: Let user 
specify it in `config.cmake` and we pass the value to a macro in C++ so that 
you could simply return the value. The drawback of this solution is that it 
needs to rebuild TVM to annotate different TensorRT versions, and I'm not sure 
if that makes sense to you.
   
   I have already created an API to retrieve the TRT version if TVM is compiled 
with the TRT runtime enabled. However, one of our use cases is to use TVM on a 
CPU-only instance to cross-compile models. For that use case, we want to be 
able to target compilation for different TRT versions - this affects the 
partitioning rules mostly. I don't think having to rebuild TVM for each target 
version will be a good solution.
   
   Is it possible for my annotation functions to access the pass context and 
therefore a TRT config that I will be adding as @masahi suggested? I don't see 
any other python code accessing the PassContext though...
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-tvm] trevor-m commented on pull request #6395: [BYOC][TensorRT] TensorRT BYOC integration

Reply via email to