canesche opened a new pull request, #17104:
URL: https://github.com/apache/tvm/pull/17104

   Description
   
   This pull request aims to enhance model optimization by adding post 
optimization in MetaSchedule. The proposed approach involves the following 
steps:
   
   1. Execution of MetaSchedule over an end-to-end model that requires 
optimization. 
   2. Selection of the best implementation identified by MetaSchedule for the 
given model. 
   3. Utilization of Droplet Search to exploit the selected candidate.
   
   By using Droplet Search as a post optimization ([Droplet 
paper](https://homepages.dcc.ufmg.br/~fernando/publications/papers/DropletSearch.pdf)),
 we have been able to reduce the number of trials explored by MetaSchedule 
while still achieving faster kernel performance. We have observed this 
improvement on the following architectures: Nvidia A100, Nvidia 3080, AMD x86, 
and ARM A64FX. The results can be found in this report: [bennu 
paper](https://homepages.dcc.ufmg.br/~michaelcanesche/paper/bennu_meta_version.pdf)
   
   Proposed Changes
   
   - Integration of Droplet Search as post optimization methodology.
   - Utilization of Droplet Search to exploit the best candidates identified by 
MetaSchedule.
   
   Motivation
   
   This pull request introduces an exploitation phase leveraging the coordinate 
descent algorithm to MetaSchedule. By iteratively refining the best kernel 
identified by MetaSchedule, we achieve two key benefits:
   1. Reduced Sample Requirements: Coordinate descent search minimizes the 
number of samples MetaSchedule needs to discover high-performing schedules.
   2. Faster Kernels: The refined kernels exhibit improved execution speed 
compared to those found by MetaSchedule alone, even when it uses more samples.
   
   Thus, this PR optimizes MetaSchedule along two crucial dimensions: search 
efficiency and kernel performance.
   
   Testing and Validation
   
   Extensive testing has been conducted to validate the efficacy and 
performance improvements achieved through the integration of MetaSchedule and 
Droplet Search. Benchmarking tests have been performed across Nvidia A100, AMD 
x86, and ARM A64FX architectures to assess the impact on kernel speed and 
search time reduction compared with 10,000 trials from MetaSchedule execution. 
These results are available in Section 3 of this manuscript: 
[paper](https://homepages.dcc.ufmg.br/~michaelcanesche/paper/bennu_meta_version.pdf)
   
   Additional Notes
   
   This pull request builds upon prior research and experimentation in model 
optimization. The proposed approach improves end-to-end models across diverse 
hardware platforms while still reducing MetaSchedule's search time. We welcome 
the community’s feedback, suggestions, and contributions to further refine and 
enhance these methodologies.
   
   Thank you.
   
   Sincerely,
   
   Michael Canesche, Gaurav Verma, and Fernando Pereira
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to