xinyu-intel commented on a change in pull request #15910: [Quantization]support
exclude operators while quantization
URL: https://github.com/apache/incubator-mxnet/pull/15910#discussion_r315097897
##########
File path: python/mxnet/contrib/quantization.py
##########
@@ -821,16 +803,13 @@ def calib_graph(qsym, arg_params, aux_params, collector,
return qsym, qarg_params, aux_params
-def quantize_net(network, quantized_dtype='auto', exclude_layers=None,
exclude_layers_match=None, calib_data=None,
- data_shapes=None, calib_mode='none', num_calib_examples=None,
ctx=cpu(), logger=logging):
+def quantize_net(network, quantized_dtype='auto',
+ exclude_layers=None, exclude_layers_match=None,
exclude_operators=None,
+ calib_data=None, data_shapes=None, calib_mode='none',
+ num_calib_examples=None, ctx=cpu(), logger=logging):
"""User-level API for Gluon users to generate a quantized SymbolBlock from
a FP32 HybridBlock w/ or w/o calibration.
The backend quantized operators are only enabled for Linux systems. Please
do not run
inference using the quantized models on Windows for now.
- The quantization implementation adopts the TensorFlow's approach:
- https://www.tensorflow.org/performance/quantization.
- The calibration implementation borrows the idea of Nvidia's 8-bit
Inference with TensorRT:
-
http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf
- and adapts the method to MXNet.
Review comment:
file too long (>1000L):(
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services