Thanks, Jun, please see my comments inline.

Wenting and Jin will follow up the tasks in the PR.

From: Jun Wu [mailto:[email protected]]
Sent: Thursday, February 1, 2018 12:40 PM
To: [email protected]
Cc: Ye, Jason Y <[email protected]>; Lv, Tao A <[email protected]>; Jiang, 
Wenting <[email protected]>; Zhao, Patric <[email protected]>
Subject: Re: Intel Plan for the contribution to MXNET

Hi Patric,

Thanks for the contribution. It’s great to see actions on developing INT8 
inference for CPU! I have a few questions and hope to have your answers.


1.      When you said your work is aligned with 
PR9552<https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you 
used quantization+calibration flows developed in that PR for benchmarking 
inferences?

[Patric] The benchmark accuracy is based on MKLDNN and ziheng’s old 
quantization branch.

Now we have merged to master (based on #8302) with quantization+calibration PR 
for int8 development, will show you the accuracy and performance soon.



2.      In you MNIST benchmark, what operators are quantized?

[Patric] Conv, relu and flatten are quantized in our mnist benchmark 
(conv+relu+flatten+FC+softmax).

Besides, MKLDNN supports pooling, concat and fused(conv with relu/elem/bn) int8 
ops.



3.      Is the MNIST quantized model calibrated?

[Patric] Not yet, we did the experiment on ziheng’s old quantization branch, 
now we are moving to branch of quantization+calibration PR.



4.      Is the inference accuracy of INT8 produced by the calibrated quantized 
model, or just quantized model without calibration?

[Patric] Without calibration



5.      What are the throughputs of FP32 model and INT8 model for inference, 
respectively?

[Patric] In this stage, we are mainly focus on the accuracy and algorithm. The 
performance fine tune is on the way ☺

Thanks,
Jun

On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric 
<[email protected]<mailto:[email protected]>> wrote:
Hi MXNET developers,

We are from Intel Software and Service Group (SSG) and working on the 
performance optimization for MXNET on Intel Architecture (IA).

Let me do a simple introduction about our ongoing projects.

Any suggestions and comments are highly appreciated.


1)      MKL-DNN integration with new NNVM interface

We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together.

The new implementation shows the better performance and flexibility than old 
MKL engine.



The PR is under review (https://github.com/apache/incubator-mxnet/pull/8302) 
and very thanks for your great comments in the thread :)

After the PR is merged, we will push more MKL-DNN related features and 
performance optimization strategies, such as fused conv + relu OP for the 
inference.



2)      INT8 inference

MKL-DNN also provides the int8 calculations such as for conv, relu,  pooling 
which can improve the inference performance a lot within very slight accuracy 
drop (like <1%).

Currently, we have implemented quantization, de-quantization, and some 
computing Ops in local branch.

Our latest implementation is aligned with this PR 
(https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit test.



For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we got 
very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%).

We will update a summary of our solution in this PR soon.



I hope both CPU and GPU can be compatible and share the common code base 
together. So, I think we need more discussion in the PR :)



3)      RNN implementations

Currently, there is no CPU implementation for mx.sym.rnn and the python 
implementation is really slower.

We are working on resolving this issue from two aspects.:

-          Provide the C/C++ level implementation, registering by FCompute<cpu> 
(GPU code should be moved to NNVM as well).

We plan to PR the LSTM/GRU in the March and our initial results as below, FYI
            Size :N = 12, T = 1600, I = 161, H = 1760 (from the first layer of 
deep speech 2)
Forward

mx.sym.gru binded Intel GRU C(s)

Native mx.rnn.GRUCell(s)

SKX 6148, 2 socket

1.32

72.7




-          Provide the MKL-DNN RNN interface (under development, 
https://github.com/intel/mkl-dnn/issues/46), registering by FComputeEx<cpu>

The higher performance RNN is under development by MKL-DNN team. And we will 
merge it when it's ready.

I think the CPU user can get further performance boost by MKL-DNN library.

     Thanks in advance!

     BR,

    -- Patric

Reply via email to