zixuanweeei commented on issue #18001: [MKLDNN] Support quantized rnn
URL: https://github.com/apache/incubator-mxnet/pull/18001#issuecomment-612741705
 
 
   > what's the performance?
   
   We have verified the accuracy and performance using a pre-trained language 
model provided by gluon-nlp ([a 
link](https://gluon-nlp.mxnet.io/examples/language_model/language_model.html#Using-a-pre-trained-AWD-LSTM-language-model)).
   
   ### Accuracy (PPL, lower is better)
   |                                    | FP32              | INT8              
 |
   |----                             |----                 |----                
  |
   |Validataion dataset   | 68.80             | 69.24              |
   |Test dataset              | 65.72             | 66.14               |
   
   The accuracy results of INT8 is very close to that of FP32.
   
   ### Performance
   #### Profiler Dumps of FP32 End-to-End 
   | Name                       | Total Count | Time (ms) | Min Time (ms) | Max 
Time (ms) | Avg Time (ms) |
   
|---------------------------:|------------:|----------:|---------:|-------------:|--------------:|
   | log_softmax                | 350         | 10968.93  | 31.09         | 
31.54         | 31.34         |
   | RNN                        | 1050        | **5664.45**   | 3.13          | 
7.37          | 5.39          |
   | _sg_mkldnn_fully_connected | 350         | 2630.26   | 7.40          | 
7.78          | 7.52          |
   | _rnn_param_concat          | 1050        | 2392.41   | 0.94          | 
3.73          | 2.28          |
   | Reshape                    | 4200        | 775.83    | 0.01          | 
0.64          | 0.18          |
   | DeleteVariable             | 3856        | 185.39    | 0.00          | 
0.53          | 0.05          |
   | CopyCPU2CPU                | 2450        | 48.89     | 0.01          | 
0.05          | 0.02          |
   | Embedding                  | 350         | 21.29     | 0.06          | 
0.08          | 0.06          |
   | WaitForVar                 | 2800        | 12.85     | 0.00          | 
0.02          | 0.00          |
   | mean                       | 350         | 9.26      | 0.02          | 
0.05          | 0.03          |
   | Dropout                    | 1400        | 8.38      | 0.00          | 
0.01          | 0.01          |
   | sum                        | 350         | 6.85      | 0.02          | 
0.04          | 0.02          |
   | pick                       | 350         | 6.55      | 0.02          | 
0.03          | 0.02          |
   | _mul_scalar                | 350         | 3.56      | 0.01          | 
0.02          | 0.01          |
   | _zeros                     | 6           | 0.16      | 0.01          | 
0.07          | 0.03          |
   | Total                      |             | **22735.04**  |               | 
              |               |
   
   #### Profiler Dumps of INT8 End-to-End
   | Name                       | Total Count | Time (ms) | Min Time (ms) | Max 
Time (ms) | Avg Time (ms) |
   
|-------------------:|-----------:|-----------:|---------------:|---------------:|---------------:|
   | log_softmax                | 350         | 10805.84  | 30.72         | 
35.89         | 30.87         |
   | _contrib_quantized_rnn     | 1050        | **2857.42**   | 1.52          | 
3.81          | 2.72          |
   | _rnn_param_concat          | 1050        | 2375.36   | 0.83          | 
5.93          | 2.26          |
   | _contrib_quantize_asym     | 1050        | 1580.61   | 0.55          | 
4.87          | 1.51          |
   | _sg_mkldnn_fully_connected | 350         | 1559.83   | 4.42          | 
4.65          | 4.46          |
   | Reshape                    | 4200        | 762.71    | 0.01          | 
0.66          | 0.18          |
   | DeleteVariable             | 3856        | 131.79    | 0.00          | 
0.44          | 0.03          |
   | CopyCPU2CPU                | 2450        | 48.68     | 0.01          | 
0.06          | 0.02          |
   | Embedding                  | 350         | 21.03     | 0.06          | 
0.08          | 0.06          |
   | WaitForVar                 | 2796        | 12.34     | 0.00          | 
0.02          | 0.00          |
   | _contrib_quantize_v2       | 350         | 11.29     | 0.03          | 
0.06          | 0.03          |
   | mean                       | 350         | 9.17      | 0.02          | 
0.15          | 0.03          |
   | Dropout                    | 1400        | 8.31      | 0.00          | 
0.01          | 0.01          |
   | sum                        | 350         | 6.63      | 0.02          | 
0.04          | 0.02          |
   | pick                       | 350         | 6.22      | 0.02          | 
0.03          | 0.02          |
   | _mul_scalar                | 350         | 3.67      | 0.01          | 
0.03          | 0.01          |
   | _zeros                     | 6           | 0.11      | 0.01          | 
0.07          | 0.02          |
   | Total                      |             | **20201.01**  |               | 
              |               |
   
   End-to-End latency got ~1.1x speedup (22735.04 vs 20201.01) which is not 
that good. However, `_contrib_quantized_rnn` got ~2.0x speedup compared with 
`RNN`. Since `RNN` only occupies ~25% of total time while it's \~48% with 
`log_softmax`, the speedup of `_contrib_quantized_rnn` might be weakened. And 
`_contrib_quantize_asym` has a poor performance which needs further 
optimization (WIP). 
   
   Besides, the quantization flow of LSTM only takes some gemm operations into 
INT8 calculation. Others, such as gates' additions, bias additions, 
element-wise activations, are remain as FP32. So the speedup of 
`_contrib_quantized_rnn` isn't able to reach the expected 3\~4x speedup.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to