mbs-octoml opened a new pull request, #12105:
URL: https://github.com/apache/tvm/pull/12105

   See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md.
   
   This completes our checkin of our Collage 'sketch' branch into main. Special 
thanks
   to Matthew Barrett for his help getting this over the line.
   
   The only C++ functionality added here is for 'pruning' candidates. This is a 
somewhat
   speculative algorithm (and I've called that out in the comments) which tries 
to
   elide candidate partitions which will 'obviously' not contribute to the 
final optimal
   partitioning. For largish models such as GPT2 this can significantly reduce 
the number of
   candidates we need to actually measure latency on. I beefed up the 
MockCostEstimator to
   make it possible to assert pruning occured from within the 
test_pass_collage_partition.py
   unit test.
   
   The rest of this PR adds the demo_collage_partition.py driver file we've 
been using
   to test and measure perfomance differences against various baseline (though 
only
   for the CUDA ecosystem). To eliminate loading time the models of interest 
are directly
   expressed in Relay text form in menangerie.py.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to