mbs-octoml opened a new pull request, #11770:
URL: https://github.com/apache/tvm/pull/11770

   I tried to do to the TensorRT integration what #11631 did to the CUTLASS 
integration, viz:
    - Make sure all compilation options are passed in Target instances. This 
helps Collage.
    - Use a custom pass invoked via RelayToTIRTargetHooks instead of the 
relay.ext.$toolchain mechanism.
      This helps use decouple external codegen from lowering.
   
   This PR collects the prep for that change:
    - TensorRT uses the JSONSerializer visitor to encode each partition 
function. Previously, when the
      visitor encountered a Constant it simply generated and recorded a name 
for the constant. Then,
      completely separately, and via a callback in TECompiler, the function is 
visited again in the
      same order and with the same name generation convention by a 
ConstantUpdater to actually collect the
      bindings, which are then encoded into a ConstLoaderModule to be made 
available at runtime.
   
      However if all TensorRT compilation is to be done by a stand-alone pass 
there's no TECompiler callback
      hackery available. So I've added a "const_name_to_ndarray" attribute to 
the IRModule of type
      Map<String, runtime::NDArray> so that named constants can be accumulated 
throughout compilation by
      any pass which needs to do so. Then the Graph, AOT and VM executors are 
all updated to merge those
      constants into the final runtime artifact
   
      (Compare with "Constants", the equivalent attribute for extracting TIR 
AllocateConsts.)
   
    - The TensorRT tests use the create_executor interface but it wasn't quite 
ready for the
      new more general form of passing list-of-targets.
   
    - I want TensorRT compilation to work out of the box without the need for 
any special targets if
      all the default options should apply. Go back and make the CUTLASS 
integration I did follow the
      same convention.
   
    - TensorRT actually needs to 'undo' partitionings in some situations. Add 
an InlineCompilerFunctions
      pass to make that robust. In particular, it must undo both the 
'partitioning' (ie separating out
      the "Compiler" function) and any 'compositing' (ie separating out small 
sub-graphs as
      "Composite" functions).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to