guberti opened a new pull request, #13752:
URL: https://github.com/apache/tvm/pull/13752
**This pull request is not ready for review.**
In #13242, I rewrote microTVM's convolution schedules to give a major
improvement in performance. While I demonstrated in tests that my changes
worked, they could not be used with `relay.build`.
This pull request expands the functionality of #13242 and adds new
`legalize` and `alter_op` passes to take advantage of the quantized schedules.
This dramatically improves performance on some models, dramatically cuts RAM
usage, and removes the _need_ for autotuning on microTVM. More specifically,
for the `vww` model from MLPerf Tiny, this pull request:
- Improves **untuned** performance from `1741 ms` to `225 ms` - a **6.8x**
improvement!
- Improves **tuned** performance from `337 ms` to `225 ms`.
- This closes **80%** of the performance gap between us and the current
state-of-the-art (which is `205 ms`).
- Reduces memory consumption by **73 KB** (a large amount on
microcontrollers!) by eliminating intermediate buffers.
[TODO work with Mehrdad so he can sign off on these numbers]
To enable the schedules that grant these performance improvements, this pull
request:
- Adds `out_layout` support to the regular and depthwise conv2d schedules
from #13242.
- Generalizes the schedules from #13242 to be more widely applicable.
- Adds a layout alternation pass to ensure regular and depthwise conv2d
schedules always get their desired input formats.
- Adds a `conv2d -> depthwise conv2d -> unpadded conv2d` rewrite step to
remove empty channels from `conv2d` operators.
- Adds a `conv2d -> average pool -> dense` rewrite step to remove empty
channels from `conv2d` operators.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]