Thanks, Jun, please see my comments inline. Wenting and Jin will follow up the tasks in the PR.
From: Jun Wu [mailto:[email protected]] Sent: Thursday, February 1, 2018 12:40 PM To: [email protected] Cc: Ye, Jason Y <[email protected]>; Lv, Tao A <[email protected]>; Jiang, Wenting <[email protected]>; Zhao, Patric <[email protected]> Subject: Re: Intel Plan for the contribution to MXNET Hi Patric, Thanks for the contribution. It’s great to see actions on developing INT8 inference for CPU! I have a few questions and hope to have your answers. 1. When you said your work is aligned with PR9552<https://github.com/apache/incubator-mxnet/pull/9552>, did you mean you used quantization+calibration flows developed in that PR for benchmarking inferences? [Patric] The benchmark accuracy is based on MKLDNN and ziheng’s old quantization branch. Now we have merged to master (based on #8302) with quantization+calibration PR for int8 development, will show you the accuracy and performance soon. 2. In you MNIST benchmark, what operators are quantized? [Patric] Conv, relu and flatten are quantized in our mnist benchmark (conv+relu+flatten+FC+softmax). Besides, MKLDNN supports pooling, concat and fused(conv with relu/elem/bn) int8 ops. 3. Is the MNIST quantized model calibrated? [Patric] Not yet, we did the experiment on ziheng’s old quantization branch, now we are moving to branch of quantization+calibration PR. 4. Is the inference accuracy of INT8 produced by the calibrated quantized model, or just quantized model without calibration? [Patric] Without calibration 5. What are the throughputs of FP32 model and INT8 model for inference, respectively? [Patric] In this stage, we are mainly focus on the accuracy and algorithm. The performance fine tune is on the way ☺ Thanks, Jun On Wed, Jan 31, 2018 at 8:08 PM, Zhao, Patric <[email protected]<mailto:[email protected]>> wrote: Hi MXNET developers, We are from Intel Software and Service Group (SSG) and working on the performance optimization for MXNET on Intel Architecture (IA). Let me do a simple introduction about our ongoing projects. Any suggestions and comments are highly appreciated. 1) MKL-DNN integration with new NNVM interface We have designed a new interface of MKL-DNN by NNVM with Zheng-Da together. The new implementation shows the better performance and flexibility than old MKL engine. The PR is under review (https://github.com/apache/incubator-mxnet/pull/8302) and very thanks for your great comments in the thread :) After the PR is merged, we will push more MKL-DNN related features and performance optimization strategies, such as fused conv + relu OP for the inference. 2) INT8 inference MKL-DNN also provides the int8 calculations such as for conv, relu, pooling which can improve the inference performance a lot within very slight accuracy drop (like <1%). Currently, we have implemented quantization, de-quantization, and some computing Ops in local branch. Our latest implementation is aligned with this PR (https://github.com/apache/incubator-mxnet/pull/9552) and passed the unit test. For a simple network (conv+relu+flatten+FC+softmax) with mnist dataset, we got very similar inference accuracy (FP32,98.06% .vs. INT8, 97.93%). We will update a summary of our solution in this PR soon. I hope both CPU and GPU can be compatible and share the common code base together. So, I think we need more discussion in the PR :) 3) RNN implementations Currently, there is no CPU implementation for mx.sym.rnn and the python implementation is really slower. We are working on resolving this issue from two aspects.: - Provide the C/C++ level implementation, registering by FCompute<cpu> (GPU code should be moved to NNVM as well). We plan to PR the LSTM/GRU in the March and our initial results as below, FYI Size :N = 12, T = 1600, I = 161, H = 1760 (from the first layer of deep speech 2) Forward mx.sym.gru binded Intel GRU C(s) Native mx.rnn.GRUCell(s) SKX 6148, 2 socket 1.32 72.7 - Provide the MKL-DNN RNN interface (under development, https://github.com/intel/mkl-dnn/issues/46), registering by FComputeEx<cpu> The higher performance RNN is under development by MKL-DNN team. And we will merge it when it's ready. I think the CPU user can get further performance boost by MKL-DNN library. Thanks in advance! BR, -- Patric
