sisleyli opened a new issue, #11867:
URL: https://github.com/apache/tvm/issues/11867
There is the description of quantize in `/python/tvm/relay/qnn/op/qnn.py`
```python
def quantize(data, output_scale, output_zero_point, axis=-1,
out_dtype="int8"):
r"""Quantize op
This operator takes float32 as input and produces quantized int8 or
unit8 as output.
The input tensor can be of any shape. The output shape is the same as
input shape.
Q_output = clamp((round(input_tensor/output_scale) + output_zero_point),
out_dtype::min,
out_dtype::max)
Parameters
```
Here it seems that the implementation is
`clamp((round(input_tensor/output_scale) + output_zero_point)` ,
but in `/src/relay/qnn/op/quantize.cc`:
```C++
const int32_t min_val = GetQmin(out_dtype);
const int32_t max_val = GetQmax(out_dtype);
auto scale_data = Divide(input_tensor, expanded_output_scale);
auto add_zero_point = Add(scale_data, Cast(expanded_output_zero_point,
DataType::Float(32)));
auto clamped_output = Clip(add_zero_point, min_val, max_val);
auto rounded_clamped_output = Round(clamped_output);
return Cast(rounded_clamped_output, out_dtype);
```
It seems that the actual implementation is
`round(clamp((input_tensor/output_scale) + output_zero_point))`
Now I'd like to know which implementation tvm wants to implement, round
first or clip first?
### Expected behavior
Quantize description is consistent with the actual implementation
### Actual behavior
Quantize description is inconsistent with the actual implementation
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]