Lurkrazy opened a new pull request, #17126:
URL: https://github.com/apache/tvm/pull/17126
This PR introduces the Dynamic Gradient Descent (DGD) Search algorithm for
accelerating the auto-tuning process of GPU kernels within the
Ansor/AutoScheduler framework. The DGD algorithm is designed to explore the
search space more efficiently than the existing Genetic Algorithm-based
approach. The following changes are included:
1. **Dynamic Gradient Descent Search:**
- Implements a new search strategy that uses gradient descent in a
multi-dimensional tile-space.
- Utilizes online measurements and proxy model to guide the search
process.
2. **Record Processor:**
- A new class to handle the processing and modification of measure
records.
- Includes methods to extract and modify SP node coordinates.
This implementation is based on the algorithm described in the paper
"[Accelerated Auto-Tuning of GPU Kernels for Tensor
Computations](https://dl.acm.org/doi/10.1145/3650200.3656626)" presented at
ICS'24.
Experimental evaluation on a number of matrix-matrix multiplication and
convolution kernels shows that the DGD algorithm achieves an order-of-magnitude
improvement in auto-tuning time while maintaining comparable code performance.
### Usage:
To use the DGD Search algorithm, instantiate the
`DynamicGradientSearchTuner` class with the desired parameters and call the
`dynamic_gradient_search` method.
### Example:
```python
tuner =
auto_scheduler.dynamic_gradient_search.DynamicGradientSearchTuner(task,
log_file, tune_option)
tuner.dynamic_gradient_search()
```
### Experiments setup:
The experiments used the DGD Search algorithm with a time budget of 1 hour
and full duration used by Ansor, comparing the performance achieved by Ansor
after suggested trials. The models used for the evaluation were Bert,
ResNet-50, and MobileNetV2, with the following configurations based on the
Apache blog [Introducing TVM Auto-scheduler (a.k.a.
Ansor)](https://tvm.apache.org/2021/03/03/intro-auto-scheduler):
- **Bert:** 12000 trials, running on an Nvidia RTX 4090 for 6 hours.
- **ResNet-50:** 20000 trials, running on an Nvidia RTX 4090 for 10 hours.
- **MobileNetV2:** 16000 trials, running on an Nvidia RTX 4090 for 7 hours.
### Relative Performance of the DGD Search algorithm achieved in 1 hour and
full duration used by Ansor
| Networks | Ratio (1 hour) | Ratio (full) |
| ------------- | -------------- | ------------- |
| Bert | 93.71% | 100.15% |
| ResNet-50 | 90.46% | 96.34% |
| MobileNetV2 | 95.08% | 101.75% |
This table presents the relative performance of the DGD Search algorithm
with a 1-hour time budget compared to the full duration used by Ansor. The
performance ratios indicate the effectiveness of the Dynamic Gradient Descent
Search algorithm in achieving comparable performance within a significantly
reduced time frame.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]