mbs-octoml opened a new pull request, #12105: URL: https://github.com/apache/tvm/pull/12105
See https://github.com/apache/tvm-rfcs/blob/main/rfcs/0062-collage.md. This completes our checkin of our Collage 'sketch' branch into main. Special thanks to Matthew Barrett for his help getting this over the line. The only C++ functionality added here is for 'pruning' candidates. This is a somewhat speculative algorithm (and I've called that out in the comments) which tries to elide candidate partitions which will 'obviously' not contribute to the final optimal partitioning. For largish models such as GPT2 this can significantly reduce the number of candidates we need to actually measure latency on. I beefed up the MockCostEstimator to make it possible to assert pruning occured from within the test_pass_collage_partition.py unit test. The rest of this PR adds the demo_collage_partition.py driver file we've been using to test and measure perfomance differences against various baseline (though only for the CUDA ecosystem). To eliminate loading time the models of interest are directly expressed in Relay text form in menangerie.py. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
