leandron commented on PR #16653: URL: https://github.com/apache/tvm/pull/16653#issuecomment-2022328650
> > Hello,@leandron . I found in [cmsis.py](https://github.com/apache/tvm/blob/ff3716b83a72c2ff261c492f259e1fcd260600ce/python/tvm/relay/op/contrib/cmsisnn.py#L90) that the scale of softmax must be 1/256 and the zero point must be -128. Why is that? According to the formula Q(x_fp32, scale, zero_point) = round(x_fp32/scale) + zero_point, scale and zp should be adjustable (for example, in the case where scale is 1/128 and zp is 0, it should still meet the conditions for int8), right? > > By the way, in my testing of the Paddle model, the scale is 0.0078649195 (close to 1/127), and the zero point is 0." This `SOFTMAX` operator coming from CMSIS-NN supports bit accurate (by design) compliance with TFLite. These are some interesting documentation links: * Softmax operator in CMSIS-NN: https://arm-software.github.io/CMSIS_5/NN/html/group__Softmax.html * Quantization specification in TFLite: https://www.tensorflow.org/lite/performance/quantization_spec Note the following restriction in the TFLite softmax: ``` SOFTMAX Input 0: data_type : int8 range : [-128, 127] granularity: per-tensor Output 0: data_type : int8 range : [-128, 127] granularity: per-tensor restriction: (scale, zero_point) = (1.0 / 256.0, -128) ``` In summary, the restriction is expected, and comes from the bit-accurate support to TFLite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
