juliusshufan edited a comment on issue #12591: USE_MKLDNN=1 is default in make 
build (mkldnn must be explicitly turned off)
URL: https://github.com/apache/incubator-mxnet/pull/12591#issuecomment-424968127
 
 
   **RNN related data, including both accuracy, and performance/benchmarking.**
   **Accuracy**
   1. **_A GNMT model_** implemented by gluon-nlp (scripts\nmt\train_gnmt.py), 
IWMT2015 dataset, en-vi translation. The decoder-encoder is a 2-layer LSTM, per 
the model implemenation, as gluon.rnncell used, the MKLDNN FC can be covered as 
it is gluon.rnncell is an unfused kernel, below figure is the ppl trends 
collected on both GPU and CPU, with same hyper-parameters, the two curves 
aligned very well.
   
![image](https://user-images.githubusercontent.com/33112206/46126432-d4a40200-c25f-11e8-8d03-8f0cfcd9712c.png)
   2. **_A simple RNN model_**, provided by official MXNET repo 
(/example/rnn/bucketing), implemented by RNN symbol API. Training tests are 
using a 3-layer LSTM/GRU RNN model with fused-RNN kernel on CPU and GPU, and 
comparses the training curves.
   
   **Benchmarking**
   Thanks to the new features released by MXNET 1.3.0 on Gluon RNN API, 
dummy-data based benchmarking are executed, using fused and unfused Gluon 
RNN-API repectively, with MXNET with MKLDNN as the backend.
   The benchmarking uses a series predefined input shape, on a 1S-SKX8180 CPU, 
28 core and 192G DDR4 memory. (The input size is the embedding size, and same 
as hidden size by default) 
   Metric is Sentence Per Second (SPS).
   
   **1-layer LSTM fused v.s. unfused**
   
   Input   Shape     (N, T, C, Input Size) | Fused | Unfused | Boost
   -- | -- | -- | --
   [64, 15, 500, 500] | 2917.237852 | 1667.527 | 174.94%
   [64, 20, 500, 500] | 3661.45311 | 1196.497 | 306.01%
   [64, 25, 500, 500] | 3288.546223 | 855.2861 | 384.50%
   [64, 30, 500, 500] | 2913.375177 | 660.5786 | 441.03%
   [64, 35, 500, 500] | 2581.44028 | 519.6848 | 496.73%
   [64, 40, 500, 500] | 2479.42023 | 714.7851 | 346.88%
   [64, 45, 500, 500] | 2300.442591 | 625.1124 | 368.00%
   [64, 50, 500, 500] | 2160.407494 | 549.2164 | 393.36%
   [16, 25, 512, 512] | 1067.593284 | 332.028 | 321.54%
   [32, 25, 512, 512] | 1830.461068 | 649.8168 | 281.69%
   [64, 25, 512, 512] | 2827.429465 | 1187.243 | 238.15%
   [128, 25, 512, 512] | 3938.397784 | 1547.932 | 254.43%
   [16, 25, 1024, 1024] | 231.900727 | 154.7335 | 149.87%
   [32, 25, 1024, 1024] | 429.570455 | 298.2182 | 144.05%
   [64, 25, 1024, 1024] | 744.384772 | 480.4162 | 154.95%
   [128, 25, 1024, 1024] | 1204.706856 | 696.3014 | 173.02%
   [16, 25, 2048, 2048] | 52.323166 | 40.81776 | 128.19%
   [32, 25, 2048, 2048] | 101.108405 | 78.72398 | 128.43%
   [64, 25, 2048, 2048] | 181.117374 | 131.4923 | 137.74%
   [128, 25, 2048, 2048] | 315.360515 | 223.4272 | 141.15%
   [16, 25, 4096, 4096] | 12.326611 | 9.575337 | 128.73%
   [32, 25, 4096, 4096] | 24.255487 | 18.75816 | 129.31%
   [64, 25, 4096, 4096] | 44.229753 | 34.00344 | 130.07%
   [128, 25, 4096, 4096] | 78.146907 | 64.36427 | 121.41%
   
   **1-layer GRU fused v.s. unfused**
   
   Input   Shape     (N, T, C, Input Size) | Fused | Unfused | Boost
   -- | -- | -- | --
   [64, 15, 500, 500] | 3981.266 | 1714.903 | 232.16%
   [64, 20, 500, 500] | 3686.065 | 1316.712 | 279.94%
   [64, 25, 500, 500] | 3430.645 | 930.4283 | 368.72%
   [64, 30, 500, 500] | 3130.724 | 722.1599 | 433.52%
   [64, 35, 500, 500] | 2982.695 | 692.9842 | 430.41%
   [64, 40, 500, 500] | 2857.4 | 621.988 | 459.40%
   [64, 45, 500, 500] | 2598.724 | 533.6256 | 486.99%
   [64, 50, 500, 500] | 2364.662 | 498.7772 | 474.09%
   [16, 25, 512, 512] | 1066.644 | 278.212 | 383.39%
   [32, 25, 512, 512] | 1861.235 | 540.8459 | 344.13%
   [64, 25, 512, 512] | 3089.303 | 1020.799 | 302.64%
   [128, 25, 512, 512] | 4679.54 | 1636.657 | 285.92%
   [16, 25, 1024, 1024] | 317.5073 | 163.0825 | 194.69%
   [32, 25, 1024, 1024] | 584.9791 | 318.4931 | 183.67%
   [64, 25, 1024, 1024] | 1051.927 | 552.1558 | 190.51%
   [128, 25, 1024, 1024] | 1568.747 | 814.037 | 192.71%
   [16, 25, 2048, 2048] | 64.3481 | 50.81243 | 126.64%
   [32, 25, 2048, 2048] | 124.1267 | 99.61789 | 124.60%
   [64, 25, 2048, 2048] | 227.109 | 170.9884 | 132.82%
   [128, 25, 2048, 2048] | 376.7918 | 279.1985 | 134.95%
   [16, 25, 4096, 4096] | 14.59219 | 12.47552 | 116.97%
   [32, 25, 4096, 4096] | 28.75226 | 24.61517 | 116.81%
   [64, 25, 4096, 4096] | 52.63095 | 44.60013 | 118.01%
   [128, 25, 4096, 4096] | 95.56435 | 83.10091 | 115.00%
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to