ibsidorenko opened a new pull request, #12659:
URL: https://github.com/apache/tvm/pull/12659
This commit adds high-performance implementation of `fixed_point_multiply`
operation based on Hexagon intrinsics for vmpye/vmpyo instructions.
Benchmarking of `fixed_point_multiply` op with (1,8,56,56,32) input
tensor on Qualcomm SM8350:
* default implementation: **10.06 ms**
* optimized implementation: **1.42 ms**
* speedup: **7x** times (!!!)
Please note that this is introducing a small round-up error for some
corner cases with negative shift argument (The same as for ARM CPU, see
[PR#5980](https://github.com/apache/tvm/pull/5980)). This is because we are
rounding twice instead than only once:
* original q_multiply_shift: round(x*y*2^-s)
* hexagon q_multiply_shift: round(round(x*y)*2^-s)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]