ZhennanQin commented on a change in pull request #13697: [MKLDNN] Enable signed
int8 support for convolution.
URL: https://github.com/apache/incubator-mxnet/pull/13697#discussion_r244628443
##########
File path: example/quantization/imagenet_gen_qsym_mkldnn.py
##########
@@ -140,8 +140,8 @@ def save_params(fname, arg_params, aux_params,
logger=None):
' thresholds. This mode is expected to produce
the best inference accuracy of all three'
' kinds of quantized models if the calibration
dataset is representative enough of the'
' inference dataset.')
- parser.add_argument('--quantized-dtype', type=str, default='uint8',
- choices=['int8', 'uint8'],
+ parser.add_argument('--quantized-dtype', type=str, default='auto',
Review comment:
> Do we do this for the first negative input into subgraphs handled by
mkldnn? If so is there an advantage to doing the reorder into a signed type as
opposed to unsigned?
in MXNet, the transformation from fp32 to uint8/int8 is done by quantize
operator, which is backed with mkldnn reorder. User can decide which out_type
is used for `_contrib_quantize_v2` operator with quantize script option:
`--quantized-dtype`, the valid choices are 'uint8', 'int8' and 'auto'.
For 'uint8', all `_contrib_quantize_v2` operator will try to transform data
from fp32 to uint8. It's better to use this mode only when there's no negative
value in network, e.g. resnet excluding first convoulution.
For 'int8', all `_contrib_quantize_v2` operator will try to transform data
from fp32 to int8. It has best adaptation as it can handle both positive and
negative values. But the disadvantage is, for those non-negative data, only 7
bits are used, causing accuracy lost. And the performance is a bit slower.
For 'auto', all `_contrib_quantize_v2` operator will try to transform data
from fp32 to either uint8 or int8 as demond. If calibration result shows that
this layer result has negative value, then the out_type will be int8, otherwise
out_type will be uint8. Then int8 will be only used for handling data with
negative value, and we can get best accuracy and performance by using uint8 and
int8 in different layers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services