[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

GitBox Fri, 04 Dec 2020 09:12:35 -0800


bgawrych commented on a change in pull request #19587:
URL: https://github.com/apache/incubator-mxnet/pull/19587#discussion_r536247183




##########
File path: example/quantization/README.md
##########
@@ -0,0 +1,184 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+<!--- -->
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+<!--- -->
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN 
to (U)INT8 model.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization with subgraph features on Intel® CPU 
Platform and can bring performance improvements on the [Intel® Xeon® Scalable 
Platform](https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-platform.html).
 To apply quantization flow to your project directly, please refer [Optimize 
custom models with MKL-DNN backend](#TODO(agrygielski)).
+
+```
+usage: python imagenet_gen_qsym_mkldnn.py [-h] [--model MODEL] [--epoch EPOCH]
+                                          [--no-pretrained] [--batch-size 
BATCH_SIZE]
+                                          [--calib-dataset CALIB_DATASET]
+                                          [--image-shape IMAGE_SHAPE]
+                                          [--data-nthreads DATA_NTHREADS]
+                                          [--num-calib-batches 
NUM_CALIB_BATCHES]
+                                          [--exclude-first-conv] 
[--shuffle-dataset]
+                                          [--calib-mode CALIB_MODE]
+                                          [--quantized-dtype {auto,int8,uint8}]
+                                          [--quiet]
+
+Generate a calibrated quantized model from a FP32 model with Intel MKL-DNN 
support
+
+optional arguments:
+  -h, --help            show this help message and exit
+  --model MODEL         model to be quantized. If no-pretrained is set then
+                        model must be provided to `model` directory in the 
same path
+                        as this python script, default is `resnet50_v1`
+  --epoch EPOCH         number of epochs, default is `0`
+  --no-pretrained       If enabled, will not download pretrained model from
+                        MXNet or Gluon-CV modelzoo, default is `False`
+  --batch-size BATCH_SIZE
+                        batch size to be used when calibrating model, default 
is `32`
+  --calib-dataset CALIB_DATASET
+                        path of the calibration dataset, default is 
`data/val_256_q90.rec`
+  --image-shape IMAGE_SHAPE
+                        number of channels, height and width of input image 
separated by comma,
+                        default is `3,224,224`
+  --data-nthreads DATA_NTHREADS
+                        number of threads for data loading, default is `0`
+  --num-calib-batches NUM_CALIB_BATCHES
+                        number of batches for calibration, default is `10`
+  --exclude-first-conv  excluding quantizing the first conv layer since the
+                        input data may have negative value which doesn't
+                        support at moment
+  --shuffle-dataset     shuffle the calibration dataset
+  --calib-mode CALIB_MODE
+                        calibration mode used for generating calibration table
+                        for the quantized symbol; supports 1. none: no
+                        calibration will be used. The thresholds for
+                        quantization will be calculated on the fly. This will
+                        result in inference speed slowdown and loss of
+                        accuracy in general. 2. naive: simply take min and max
+                        values of layer outputs as thresholds for
+                        quantization. In general, the inference accuracy
+                        worsens with more examples used in calibration. It is
+                        recommended to use `entropy` mode as it produces more
+                        accurate inference results. 3. entropy: calculate KL
+                        divergence of the fp32 output and quantized output for
+                        optimal thresholds. This mode is expected to produce
+                        the best inference accuracy of all three kinds of
+                        quantized models if the calibration dataset is
+                        representative enough of the inference dataset.
+                        default is `entropy`
+  --quantized-dtype {auto,int8,uint8}
+                        quantization destination data type for input data,
+                        default is `auto`
+  --quiet               suppress most of log
+```
+
+A new benchmark script `launch_inference_mkldnn.sh` has been designed to 
launch performance benchmark for float32 or int8 image-classification models 
with Intel® MKL-DNN.
+```
+usage: bash ./launch_inference_mkldnn.sh -s symbol_file [-b batch_size] [-iter 
iteraton] [-ins instance] [-c cores/instance] [-h]
+
+arguments:
+  -h, --help                show this help message and exit
+  -s, --symbol_file         symbol file for benchmark, required
+  -b, --batch_size          inference batch size
+                            default: 64
+  -iter, --iteration        inference iteration
+                            default: 500
+  -ins, --instance          launch multi-instance inference
+                            default: one instance per socket
+  -c, --core                number of cores per instance
+                            default: divide full physical cores
+
+example: resnet int8 performance benchmark on c5.24xlarge(duo sockets, 24 
physical cores per socket).
+
+    bash ./launch_inference_mkldnn.sh -s 
./model/resnet50_v1-quantized-5batches-naive-symbol.json
+
+will launch two instances for throughput benchmark and each instance will use 
24 physical cores.
+```
+
+
+<h3 id='3'>ResNetV1</h3>

Review comment:
       We choose this ResNetV1 as an example - previously in this file there 
was more models but content was almost the same (changed model name only).
   Moreover only some models work with script, because of this issue: 
https://github.com/apache/incubator-mxnet/issues/19580 and GluonCV for master 
branch is not compatible yet
   We will add references when more models will be available




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-mxnet] bgawrych commented on a change in pull request #19587: [FEATURE] Restore Quantization API to MXNet

Reply via email to