Lurkrazy opened a new pull request, #17126:
URL: https://github.com/apache/tvm/pull/17126

   This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for 
accelerating the auto-tuning process of GPU kernels within the 
Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the 
search space more efficiently than the existing Genetic Algorithm-based 
approach. The following changes are included:
   
   1. **Dynamic Gradient Descent Search:**
      - Implements a new search strategy that uses gradient descent in a 
multi-dimensional tile-space.
      - Utilizes online measurements and proxy model to guide the search 
process.
   
   2. **Record Processor:**
      - A new class to handle the processing and modification of measure 
records.
      - Includes methods to extract and modify SP node coordinates.
   
   This implementation is based on the algorithm described in the paper 
"[Accelerated Auto-Tuning of GPU Kernels for Tensor 
Computations](https://dl.acm.org/doi/10.1145/3650200.3656626)" presented at 
ICS'24.
   
   Experimental evaluation on a number of matrix-matrix multiplication and 
convolution kernels shows that the DGD algorithm achieves an order-of-magnitude 
improvement in auto-tuning time while maintaining comparable code performance.
   
   ### Usage:
   To use the DGD Search algorithm, instantiate the 
`DynamicGradientSearchTuner` class with the desired parameters and call the 
`dynamic_gradient_search` method.
   
   ### Example:
   ```python
   tuner = 
auto_scheduler.dynamic_gradient_search.DynamicGradientSearchTuner(task, 
log_file, tune_option)
   tuner.dynamic_gradient_search()
   ```
   
   ### Experiments setup: 
   The experiments used the DGD Search algorithm with a time budget of 1 hour 
and full duration used by Ansor, comparing the performance achieved by Ansor 
after suggested trials. The models used for the evaluation were Bert, 
ResNet-50, and MobileNetV2, with the following configurations based on the 
Apache blog [Introducing TVM Auto-scheduler (a.k.a. 
Ansor)](https://tvm.apache.org/2021/03/03/intro-auto-scheduler):
   
   - **Bert:** 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
   - **ResNet-50:** 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
   - **MobileNetV2:** 16000 trials, running on an Nvidia RTX 4090 for 7 hours.
   
   ### Relative Performance of the DGD Search algorithm achieved in 1 hour and 
full duration used by Ansor
   
   | Networks      | Ratio (1 hour) | Ratio (full) |
   | ------------- | -------------- | ------------- |
   | Bert          | 93.71%         | 100.15%       |
   | ResNet-50     | 90.46%         | 96.34%        |
   | MobileNetV2   | 95.08%         | 101.75%       |
   
   This table presents the relative performance of the DGD Search algorithm 
with a 1-hour time budget compared to the full duration used by Ansor. The 
performance ratios indicate the effectiveness of the Dynamic Gradient Descent 
Search algorithm in achieving comparable performance within a significantly 
reduced time frame.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to