ibsidorenko opened a new pull request, #13080: URL: https://github.com/apache/tvm/pull/13080
Main goal of this commit is to improve performance for Hexagon target and preserve performance/accuracy for x86, GPU and etc. targets. "qnn.requantize" operation is lowered into the sequence of multiply, add, shift during QNN canonicalization pass if scale quantization parameter is vector of scalars. This commit adds new Relay per-channel/per-axis FixedPointMultiply operation and is used in "qnn.requantize" operation lowering. per-channel/per-axis FixedPointMultiply is implemented through tir.q_multiply_shift_per_axis intrinsic. For Hexagon target it overrides default implementation and generate HVX vmpye/vmpyo instruction (see _q_multiply_shift_per_axis_hexagon). For all other targets it uses default implementation (64 bits arithmetic). **Performance/accuracy measurement:** * _CPU(x86) target_: accuracy and performance are the same. For other targets should be the same (otherwise it is bug). * _Hexagon target_: speedup of qnn.requantize 7x-9x times (Snapdragon 888, 4.4 ms -> 0.5 ms) Unfortunately, in some cases output of requantize can differ by "1" compared to reference output. We need to consider this tradeoff between performance and accuracy when we choose this optimized implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
