[GitHub] xinyu-intel commented on a change in pull request #12808: MKL-DNN Quantization Examples and README

GitBox Tue, 16 Oct 2018 09:05:18 -0700

xinyu-intel commented on a change in pull request #12808: MKL-DNN Quantization 
Examples and README
URL: https://github.com/apache/incubator-mxnet/pull/12808#discussion_r225606819


 ##########
 File path: example/quantization/README.md
 ##########
 @@ -1,4 +1,248 @@
 # Model Quantization with Calibration Examples
+
+This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN 
or CUDNN.
+
+<h2 id="0">Contents</h2>
+
+* [1. Model Quantization with Intel® MKL-DNN](#1)
+* [2. Model Quantization with CUDNN](#2)
+
+<h2 id="1">Model Quantization with Intel® MKL-DNN</h2>
+
+Intel® MKL-DNN supports quantization well with subgraph feature on Intel® CPU 
Platform and can bring huge performance improvement on Intel® Xeon® Scalable 
Platform. A new quantization script `imagenet_gen_qsym_mkldnn.py` has been 
designed to launch quantization for image-classification models with Intel® 
MKL-DNN. This script intergrates with [Gluon-CV 
modelzoo](https://gluon-cv.mxnet.io/model_zoo/classification.html) so that more 
pre-trained models can be downloaded from Gluon-CV and can be converted for 
quantization. This script also supports custom models.
+
+Use below command to install Gluon-CV:
+
+```
+pip install gluoncv
+```
+
+The following models have been tested on Linux systems.
+
+| Model | Source | Dataset | FP32 Accuracy | INT8 Accuracy |
+|:---|:---|---|:---:|:---:|
+| [ResNet50-V1](#3)  | 
[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)  | 
[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)  | 
75.87%/92.72%  |  75.71%/92.65% |
+|[Squeezenet 
1.0](#4)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation
 
Dataset](http://data.mxnet.io/data/val_256_q90.rec)|57.01%/79.71%|56.62%/79.55%|
+|[MobileNet 
1.0](#5)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation
 
Dataset](http://data.mxnet.io/data/val_256_q90.rec)|69.76%/89.32%|69.61%/89.09%|
+|[ResNet152-V2](#6)|[MXNet 
ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation 
Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.76%/93.03%|76.48%/92.96%|
+|[Inception-BN](#7)|[MXNet 
ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation 
Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.09%/90.60%|72.00%/90.53%|
+| [SSD-VGG](#8) | 
[example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd)
  | VOC2007/2012  | 0.83 mAP  | 0.82 mAP  |
+
+<h3 id='3'>ResNet50-V1</h3>
+
+The following command is to download the pre-trained model from Gluon-CV and 
transfer it into the symbolic model which would be finally quantized. The 
validation dataset is available 
[here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained 
models:
+
+```
+python imagenet_gen_qsym_mkldnn.py --model=resnet50_v1 --num-calib-batches=5 
--calib-mode=naive
+```
+
+The model would be automatically replaced in fusion and quantization format 
and saved as the quantized symbol and parameter fils in `./model` dictionary. 
The following command is to launch inference.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference 
+python imagenet_inference.py --symbol-file=./model/resnet50_v1-symbol.json 
--param-file=./model/resnet50_v1-0000.params --rgb-mean=123.68,116.779,103.939 
--rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 
--num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu 
--data-nthreads=1
+
+# Launch INT8 Inference
+python imagenet_inference.py 
--symbol-file=./model/resnet50_v1-quantized-5batches-naive-symbol.json 
--param-file=./model/resnet50_v1-quantized-0000.params 
--rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 
--num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 
--dataset=./data/val_256_q90.rec --ctx=cpu  --data-nthreads=1
+
+# Launch dummy data Inference
+python imagenet_inference.py --symbol-file=./model/resnet50_v1-symbol.json 
--batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True
+python imagenet_inference.py 
--symbol-file=./model/resnet50_v1-quantized-5batches-naive-symbol.json 
--batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True
+```
+
+<h3 id='4'>SqueezeNet1.0</h3>
+
+The following command is to download the pre-trained model from Gluon-CV and 
transfer it into the symbolic model which would be finally quantized. The 
validation dataset is available 
[here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained 
models:
+
+```
+python imagenet_gen_qsym_mkldnn.py --model=squeezenet1.0 --num-calib-batches=5 
--calib-mode=naive
+```
+The model would be automatically replaced in fusion and quantization format 
and saved as the quantized symbol and parameter fils in `./model` dictionary. 
The following command is to launch inference.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference
+python imagenet_inference.py --symbol-file=./model/squeezenet1.0-symbol.json 
--param-file=./model/squeezenet1.0-0000.params 
--rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 
--num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 
--dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1
+
+# Launch INT8 Inference
+python imagenet_inference.py 
--symbol-file=./model/squeezenet1.0-quantized-5batches-naive-symbol.json 
--param-file=./model/squeezenet1.0-quantized-0000.params 
--rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 
--num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 
--dataset=./data/val_256_q90.rec --ctx=cpu  --data-nthreads=1
+
+# Launch dummy data Inference
+python imagenet_inference.py --symbol-file=./model/squeezenet1.0-symbol.json 
--batch-size=64 --num-inference-batches=500 --ctx=cpu  --benchmark=True
+python imagenet_inference.py 
--symbol-file=./model/squeezenet1.0-quantized-5batches-naive-symbol.json 
--batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True
+```
+
+<h3 id='5'>MobileNet1.0</h3>
+
+The following command is to download the pre-trained model from Gluon-CV and 
transfer it into the symbolic model which would be finally quantized. The 
validation dataset is available 
[here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained 
models:
+
+```
+python imagenet_gen_qsym_mkldnn.py --model=mobilenet1.0 --num-calib-batches=5 
--calib-mode=naive
+```
+The model would be automatically replaced in fusion and quantization format 
and saved as the quantized symbol and parameter fils in `./model` dictionary. 
The following command is to launch inference.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference
+python imagenet_inference.py --symbol-file=./model/mobilenet1.0-symbol.json 
--param-file=./model/mobilenet1.0-0000.params --rgb-mean=123.68,116.779,103.939 
--rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 
--num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu 
--data-nthreads=1
+
+# Launch INT8 Inference
+python imagenet_inference.py 
--symbol-file=./model/mobilenet1.0-quantized-5batches-naive-symbol.json 
--param-file=./model/mobilenet1.0-quantized-0000.params 
--rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 
--num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 
--dataset=./data/val_256_q90.rec --ctx=cpu  --data-nthreads=1
+
+# Launch dummy data Inference
+python imagenet_inference.py --symbol-file=./model/mobilenet1.0-symbol.json 
--batch-size=64 --num-inference-batches=500 --ctx=cpu  --benchmark=True
+python imagenet_inference.py 
--symbol-file=./model/mobilenet1.0-quantized-5batches-naive-symbol.json 
--batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True
+```
+
+<h3 id='6'>ResNet152-V2</h3>
+
+The following command is to download the pre-trained model from [MXNet 
ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/) which would 
be finally quantized. The validation dataset is available 
[here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained 
models:
+
+```
+python imagenet_gen_qsym_mkldnn.py --model=imagenet1k-resnet-152 
--num-calib-batches=5 --calib-mode=naive
+```
+
+The model would be automatically replaced in fusion and quantization format 
and saved as the quantized symbol and parameter fils in `./model` dictionary. 
The following command is to launch inference.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference 
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-resnet-152-symbol.json 
--param-file=./model/imagenet1k-resnet-152-0000.params --num-skipped-batches=50 
--batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec 
--ctx=cpu --data-nthreads=1
+
+# Launch INT8 Inference
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-resnet-152-quantized-5batches-naive-symbol.json
 --param-file=./model/imagenet1k-resnet-152-quantized-0000.params 
--num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 
--dataset=./data/val_256_q90.rec --ctx=cpu  --data-nthreads=1
+
+# Launch dummy data Inference
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-resnet-152-symbol.json --batch-size=64 
--num-inference-batches=500 --ctx=cpu --benchmark=True
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-resnet-152-quantized-5batches-naive-symbol.json
 --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True
+```
+
+<h3 id='7'>Inception-BN</h3>
+
+The following command is to download the pre-trained model from [MXNet 
ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/) which would be 
finally quantized. The validation dataset is available 
[here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained 
models:
+
+```
+python imagenet_gen_qsym_mkldnn.py --model=imagenet1k-inception-bn 
--num-calib-batches=5 --calib-mode=naive
+```
+
+The model would be automatically replaced in fusion and quantization format 
and saved as the quantized symbol and parameter fils in `./model` dictionary. 
The following command is to launch inference.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference 
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-inception-bn-symbol.json 
--param-file=./model/imagenet1k-inception-bn-0000.params 
--rgb-mean=123.68,116.779,103.939 --num-skipped-batches=50 --batch-size=64 
--num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu 
--data-nthreads=1
+
+# Launch INT8 Inference
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-inception-bn-quantized-5batches-naive-symbol.json
 --param-file=./model/imagenet1k-inception-bn-quantized-0000.params 
--rgb-mean=123.68,116.779,103.939 --num-skipped-batches=50 --batch-size=64 
--num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu  
--data-nthreads=1
+
+# Launch dummy data Inference
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-inception-bn-symbol.json --batch-size=64 
--num-inference-batches=500 --ctx=cpu --benchmark=True
+python imagenet_inference.py 
--symbol-file=./model/imagenet1k-inception-bn-quantized-5batches-naive-symbol.json
 --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True
+```
+
+<h3 id='8'>SSD-VGG</h3>
+
+Go to 
[example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd)
 dictionary. Following the 
[instruction](https://github.com/apache/incubator-mxnet/tree/master/example/ssd#train-the-model)
 in 
[example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd)
 to train a FP32 `SSD-VGG16_reduced_300x300` model based on Pascal VOC dataset. 
You can also download our [pre-trained 
model](http://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_vgg16_reduced_300-dd479559.zip)
 and [packed binary 
data](http://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/ssd-val-fc19a535.zip)
 then rename them and extract to `model/` and `data/` dictionary as below.
+
+```
+data/
+|---val.rec
+|---val.lxt
+|---val.idx
+model/
+|---ssd_vgg16_reduced_300.params
+|---ssd_vgg16_reduced_300-symbol.json
+```
+
+Then, use the following command for quantization. By default, this script use 
5 batches(32 samples per batch) for naive calib:
+
+```
+python quantization.py
+```
+
+After quantization, INT8 models will be saved in `model/` dictionary.  Use 
below command to launch inference.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference 
+python evaluate.py --cpu --num-batch 10 --batch-size 224 --deploy 
--prefix=./model/ssd_
+
+# Launch INT8 Inference
+python evaluate.py --cpu --num-batch 10 --batch-size 224 --deploy 
--prefix=./model/cqssd_
+
+# Launch dummy data Inference
+python benchmark_score.py --deploy --prefix=./model/ssd_
+python benchmark_score.py --deploy --prefix=./model/cqssd_
+```
+
+<h3 id='9'>Custom Model</h3>
+
+This script also supports custom symbolic models. You can easily add some 
quantization layer configs in `imagenet_gen_qsym_mkldnn.py` like below:
+
+```
+elif args.model == 'custom':
+    # add rgb mean/std of your model.
+    rgb_mean = '0,0,0'
+    rgb_std = '0,0,0'
+    calib_layer = lambda name: name.endswith('_output')
+    # add layer names you donnot want to quantize.
+    # add conv/pool layer names that has negative inputs
+    # since Intel® MKL-DNN only support uint8 quantization temporary.
+    # add all fc layer names since Intel® MKL-DNN does not support temporary.
+    excluded_sym_names += ['layers']
+    # add your first conv layer names since Intel® MKL-DNN only support uint8 
quantization temporary.
+    if exclude_first_conv:
+        excluded_sym_names += ['layers']
+```
+
+Some tips on quantization configs:
+
+1. First, you should prepare your data, symbol file (custom-symbol.json) and 
parameter file (custom-0000.params) of your fp32 symbolic model.
+2. Then, you should run the below command and keep sure that your fp32 
symbolic model runs inference well.
+
+```
+# USE MKLDNN AS SUBGRAPH BACKEND
+export MXNET_SUBGRAPH_BACKEND=MKLDNN
+
+# Launch FP32 Inference 
+python imagenet_inference.py --symbol-file=./model/custom-symbol.json 
--param-file=./model/custom-0000.params --rgb-mean=* --rgb-std=* 
--num-skipped-batches=* --batch-size=* 
--num-inference-batches=*--dataset=./data/* --ctx=cpu --data-nthreads=1
+```
+
+3. Then, you should add `rgb_mean`, `rgb_std` and `excluded_sym_names` in this 
script. Notice that you should exclude conv/pool layers that has negative data 
since Intel® MKL-DNN only support uint8 quantization temporary. You should also 
exclude all fc layers in your mdoel.
+
+4. Then, you can run below command for quantization:
+
+```
+python imagenet_gen_qsym_mkldnn.py --model=custom --num-calib-batches=5 
--calib-mode=naive
+```
+
+5. After quantization, INT8 symbol and parameter will be saved in `model/` 
dictionary.
 
 Review comment:
   Change to quantized. In fact, int8 means 8 bit integer including unsigned 
int8 and signed int8. We will support both of them later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] xinyu-intel commented on a change in pull request #12808: MKL-DNN Quantization Examples and README

Reply via email to