DzAvril opened a new pull request, #11155:
URL: https://github.com/apache/tvm/pull/11155

   
   **Problems introduced by the “two rounding” behavior on arm_cpu**
   op:`q_multiply_shift`, used in re-quantization.
   
   The DEFAULT path:
   
   - file: [src/target/intrin_rule.cc 
2](https://github.com/apache/tvm/blob/main/src/target/intrin_rule.cc#L154)
   
   The NEON path:
   
   - file: 
[python/tvm/topi/arm_cpu/tensor_intrin.py](https://github.com/apache/tvm/blob/main/python/tvm/topi/arm_cpu/tensor_intrin.py#L1147)
   
   The NEON path may produce some different values (due to two rounding)[1], 
compared with the _DEFAULT_ path, within a single layer.
   
   The problem is, on`arm_cpu`, it will sometimes use the DEAULT path, 
sometimes use the NEON path:
   
   - If the innermost axis, is a multiple of four and vectorization applied, 
the NEON path is enabled
   - Otherwise, the DEFAULT path
   
   BTW, it looks like “two rounding” is important to make a “bit exact result” 
of TFLite qnnpack, see
   
   - [Supporting bit exact TFLite QNN 
inference](https://discuss.tvm.apache.org/t/supporting-bit-exact-tflite-qnn-inference/5528)
   - [TFLite Rounding](https://discuss.tvm.apache.org/t/tflite-rounding/6287)
   
   Discussion in TVM forum: 
https://discuss.tvm.apache.org/t/quantization-aligning-result-of-tvm-to-torchs/12225
   
   P.S. The patch in this PR is created by @wangxunx, I helped to send this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to