xinyu-intel commented on a change in pull request #12808: MKL-DNN Quantization Examples and README URL: https://github.com/apache/incubator-mxnet/pull/12808#discussion_r225606819
########## File path: example/quantization/README.md ########## @@ -1,4 +1,248 @@ # Model Quantization with Calibration Examples + +This folder contains examples of quantizing a FP32 model with Intel® MKL-DNN or CUDNN. + +<h2 id="0">Contents</h2> + +* [1. Model Quantization with Intel® MKL-DNN](#1) +* [2. Model Quantization with CUDNN](#2) + +<h2 id="1">Model Quantization with Intel® MKL-DNN</h2> + +Intel® MKL-DNN supports quantization well with subgraph feature on Intel® CPU Platform and can bring huge performance improvement on Intel® Xeon® Scalable Platform. A new quantization script `imagenet_gen_qsym_mkldnn.py` has been designed to launch quantization for image-classification models with Intel® MKL-DNN. This script intergrates with [Gluon-CV modelzoo](https://gluon-cv.mxnet.io/model_zoo/classification.html) so that more pre-trained models can be downloaded from Gluon-CV and can be converted for quantization. This script also supports custom models. + +Use below command to install Gluon-CV: + +``` +pip install gluoncv +``` + +The following models have been tested on Linux systems. + +| Model | Source | Dataset | FP32 Accuracy | INT8 Accuracy | +|:---|:---|---|:---:|:---:| +| [ResNet50-V1](#3) | [Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html) | [Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec) | 75.87%/92.72% | 75.71%/92.65% | +|[Squeezenet 1.0](#4)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|57.01%/79.71%|56.62%/79.55%| +|[MobileNet 1.0](#5)|[Gluon-CV](https://gluon-cv.mxnet.io/model_zoo/classification.html)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|69.76%/89.32%|69.61%/89.09%| +|[ResNet152-V2](#6)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|76.76%/93.03%|76.48%/92.96%| +|[Inception-BN](#7)|[MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/)|[Validation Dataset](http://data.mxnet.io/data/val_256_q90.rec)|72.09%/90.60%|72.00%/90.53%| +| [SSD-VGG](#8) | [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) | VOC2007/2012 | 0.83 mAP | 0.82 mAP | + +<h3 id='3'>ResNet50-V1</h3> + +The following command is to download the pre-trained model from Gluon-CV and transfer it into the symbolic model which would be finally quantized. The validation dataset is available [here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained models: + +``` +python imagenet_gen_qsym_mkldnn.py --model=resnet50_v1 --num-calib-batches=5 --calib-mode=naive +``` + +The model would be automatically replaced in fusion and quantization format and saved as the quantized symbol and parameter fils in `./model` dictionary. The following command is to launch inference. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python imagenet_inference.py --symbol-file=./model/resnet50_v1-symbol.json --param-file=./model/resnet50_v1-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch INT8 Inference +python imagenet_inference.py --symbol-file=./model/resnet50_v1-quantized-5batches-naive-symbol.json --param-file=./model/resnet50_v1-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch dummy data Inference +python imagenet_inference.py --symbol-file=./model/resnet50_v1-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +python imagenet_inference.py --symbol-file=./model/resnet50_v1-quantized-5batches-naive-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +``` + +<h3 id='4'>SqueezeNet1.0</h3> + +The following command is to download the pre-trained model from Gluon-CV and transfer it into the symbolic model which would be finally quantized. The validation dataset is available [here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained models: + +``` +python imagenet_gen_qsym_mkldnn.py --model=squeezenet1.0 --num-calib-batches=5 --calib-mode=naive +``` +The model would be automatically replaced in fusion and quantization format and saved as the quantized symbol and parameter fils in `./model` dictionary. The following command is to launch inference. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python imagenet_inference.py --symbol-file=./model/squeezenet1.0-symbol.json --param-file=./model/squeezenet1.0-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch INT8 Inference +python imagenet_inference.py --symbol-file=./model/squeezenet1.0-quantized-5batches-naive-symbol.json --param-file=./model/squeezenet1.0-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch dummy data Inference +python imagenet_inference.py --symbol-file=./model/squeezenet1.0-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +python imagenet_inference.py --symbol-file=./model/squeezenet1.0-quantized-5batches-naive-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +``` + +<h3 id='5'>MobileNet1.0</h3> + +The following command is to download the pre-trained model from Gluon-CV and transfer it into the symbolic model which would be finally quantized. The validation dataset is available [here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained models: + +``` +python imagenet_gen_qsym_mkldnn.py --model=mobilenet1.0 --num-calib-batches=5 --calib-mode=naive +``` +The model would be automatically replaced in fusion and quantization format and saved as the quantized symbol and parameter fils in `./model` dictionary. The following command is to launch inference. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python imagenet_inference.py --symbol-file=./model/mobilenet1.0-symbol.json --param-file=./model/mobilenet1.0-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch INT8 Inference +python imagenet_inference.py --symbol-file=./model/mobilenet1.0-quantized-5batches-naive-symbol.json --param-file=./model/mobilenet1.0-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --rgb-std=58.393,57.12,57.375 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch dummy data Inference +python imagenet_inference.py --symbol-file=./model/mobilenet1.0-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +python imagenet_inference.py --symbol-file=./model/mobilenet1.0-quantized-5batches-naive-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +``` + +<h3 id='6'>ResNet152-V2</h3> + +The following command is to download the pre-trained model from [MXNet ModelZoo](http://data.mxnet.io/models/imagenet/resnet/152-layers/) which would be finally quantized. The validation dataset is available [here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained models: + +``` +python imagenet_gen_qsym_mkldnn.py --model=imagenet1k-resnet-152 --num-calib-batches=5 --calib-mode=naive +``` + +The model would be automatically replaced in fusion and quantization format and saved as the quantized symbol and parameter fils in `./model` dictionary. The following command is to launch inference. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python imagenet_inference.py --symbol-file=./model/imagenet1k-resnet-152-symbol.json --param-file=./model/imagenet1k-resnet-152-0000.params --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch INT8 Inference +python imagenet_inference.py --symbol-file=./model/imagenet1k-resnet-152-quantized-5batches-naive-symbol.json --param-file=./model/imagenet1k-resnet-152-quantized-0000.params --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch dummy data Inference +python imagenet_inference.py --symbol-file=./model/imagenet1k-resnet-152-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +python imagenet_inference.py --symbol-file=./model/imagenet1k-resnet-152-quantized-5batches-naive-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +``` + +<h3 id='7'>Inception-BN</h3> + +The following command is to download the pre-trained model from [MXNet ModelZoo](http://data.mxnet.io/models/imagenet/inception-bn/) which would be finally quantized. The validation dataset is available [here](http://data.mxnet.io/data/val_256_q90.rec) for testing the pre-trained models: + +``` +python imagenet_gen_qsym_mkldnn.py --model=imagenet1k-inception-bn --num-calib-batches=5 --calib-mode=naive +``` + +The model would be automatically replaced in fusion and quantization format and saved as the quantized symbol and parameter fils in `./model` dictionary. The following command is to launch inference. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-symbol.json --param-file=./model/imagenet1k-inception-bn-0000.params --rgb-mean=123.68,116.779,103.939 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch INT8 Inference +python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-quantized-5batches-naive-symbol.json --param-file=./model/imagenet1k-inception-bn-quantized-0000.params --rgb-mean=123.68,116.779,103.939 --num-skipped-batches=50 --batch-size=64 --num-inference-batches=500 --dataset=./data/val_256_q90.rec --ctx=cpu --data-nthreads=1 + +# Launch dummy data Inference +python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +python imagenet_inference.py --symbol-file=./model/imagenet1k-inception-bn-quantized-5batches-naive-symbol.json --batch-size=64 --num-inference-batches=500 --ctx=cpu --benchmark=True +``` + +<h3 id='8'>SSD-VGG</h3> + +Go to [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) dictionary. Following the [instruction](https://github.com/apache/incubator-mxnet/tree/master/example/ssd#train-the-model) in [example/ssd](https://github.com/apache/incubator-mxnet/tree/master/example/ssd) to train a FP32 `SSD-VGG16_reduced_300x300` model based on Pascal VOC dataset. You can also download our [pre-trained model](http://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/ssd_vgg16_reduced_300-dd479559.zip) and [packed binary data](http://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/ssd-val-fc19a535.zip) then rename them and extract to `model/` and `data/` dictionary as below. + +``` +data/ +|---val.rec +|---val.lxt +|---val.idx +model/ +|---ssd_vgg16_reduced_300.params +|---ssd_vgg16_reduced_300-symbol.json +``` + +Then, use the following command for quantization. By default, this script use 5 batches(32 samples per batch) for naive calib: + +``` +python quantization.py +``` + +After quantization, INT8 models will be saved in `model/` dictionary. Use below command to launch inference. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python evaluate.py --cpu --num-batch 10 --batch-size 224 --deploy --prefix=./model/ssd_ + +# Launch INT8 Inference +python evaluate.py --cpu --num-batch 10 --batch-size 224 --deploy --prefix=./model/cqssd_ + +# Launch dummy data Inference +python benchmark_score.py --deploy --prefix=./model/ssd_ +python benchmark_score.py --deploy --prefix=./model/cqssd_ +``` + +<h3 id='9'>Custom Model</h3> + +This script also supports custom symbolic models. You can easily add some quantization layer configs in `imagenet_gen_qsym_mkldnn.py` like below: + +``` +elif args.model == 'custom': + # add rgb mean/std of your model. + rgb_mean = '0,0,0' + rgb_std = '0,0,0' + calib_layer = lambda name: name.endswith('_output') + # add layer names you donnot want to quantize. + # add conv/pool layer names that has negative inputs + # since Intel® MKL-DNN only support uint8 quantization temporary. + # add all fc layer names since Intel® MKL-DNN does not support temporary. + excluded_sym_names += ['layers'] + # add your first conv layer names since Intel® MKL-DNN only support uint8 quantization temporary. + if exclude_first_conv: + excluded_sym_names += ['layers'] +``` + +Some tips on quantization configs: + +1. First, you should prepare your data, symbol file (custom-symbol.json) and parameter file (custom-0000.params) of your fp32 symbolic model. +2. Then, you should run the below command and keep sure that your fp32 symbolic model runs inference well. + +``` +# USE MKLDNN AS SUBGRAPH BACKEND +export MXNET_SUBGRAPH_BACKEND=MKLDNN + +# Launch FP32 Inference +python imagenet_inference.py --symbol-file=./model/custom-symbol.json --param-file=./model/custom-0000.params --rgb-mean=* --rgb-std=* --num-skipped-batches=* --batch-size=* --num-inference-batches=*--dataset=./data/* --ctx=cpu --data-nthreads=1 +``` + +3. Then, you should add `rgb_mean`, `rgb_std` and `excluded_sym_names` in this script. Notice that you should exclude conv/pool layers that has negative data since Intel® MKL-DNN only support uint8 quantization temporary. You should also exclude all fc layers in your mdoel. + +4. Then, you can run below command for quantization: + +``` +python imagenet_gen_qsym_mkldnn.py --model=custom --num-calib-batches=5 --calib-mode=naive +``` + +5. After quantization, INT8 symbol and parameter will be saved in `model/` dictionary. Review comment: Change to quantized. In fact, int8 means 8 bit integer including unsigned int8 and signed int8. We will support both of them later. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
