canesche opened a new pull request, #17104: URL: https://github.com/apache/tvm/pull/17104
Description This pull request aims to enhance model optimization by adding post optimization in MetaSchedule. The proposed approach involves the following steps: 1. Execution of MetaSchedule over an end-to-end model that requires optimization. 2. Selection of the best implementation identified by MetaSchedule for the given model. 3. Utilization of Droplet Search to exploit the selected candidate. By using Droplet Search as a post optimization ([Droplet paper](https://homepages.dcc.ufmg.br/~fernando/publications/papers/DropletSearch.pdf)), we have been able to reduce the number of trials explored by MetaSchedule while still achieving faster kernel performance. We have observed this improvement on the following architectures: Nvidia A100, Nvidia 3080, AMD x86, and ARM A64FX. The results can be found in this report: [bennu paper](https://homepages.dcc.ufmg.br/~michaelcanesche/paper/bennu_meta_version.pdf) Proposed Changes - Integration of Droplet Search as post optimization methodology. - Utilization of Droplet Search to exploit the best candidates identified by MetaSchedule. Motivation This pull request introduces an exploitation phase leveraging the coordinate descent algorithm to MetaSchedule. By iteratively refining the best kernel identified by MetaSchedule, we achieve two key benefits: 1. Reduced Sample Requirements: Coordinate descent search minimizes the number of samples MetaSchedule needs to discover high-performing schedules. 2. Faster Kernels: The refined kernels exhibit improved execution speed compared to those found by MetaSchedule alone, even when it uses more samples. Thus, this PR optimizes MetaSchedule along two crucial dimensions: search efficiency and kernel performance. Testing and Validation Extensive testing has been conducted to validate the efficacy and performance improvements achieved through the integration of MetaSchedule and Droplet Search. Benchmarking tests have been performed across Nvidia A100, AMD x86, and ARM A64FX architectures to assess the impact on kernel speed and search time reduction compared with 10,000 trials from MetaSchedule execution. These results are available in Section 3 of this manuscript: [paper](https://homepages.dcc.ufmg.br/~michaelcanesche/paper/bennu_meta_version.pdf) Additional Notes This pull request builds upon prior research and experimentation in model optimization. The proposed approach improves end-to-end models across diverse hardware platforms while still reducing MetaSchedule's search time. We welcome the community’s feedback, suggestions, and contributions to further refine and enhance these methodologies. Thank you. Sincerely, Michael Canesche, Gaurav Verma, and Fernando Pereira -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
