**RNN related data, including both accuracy, and performance/benchmarking.**
**Accuracy**
1. **_A GNMT model_** implemented by gluon-nlp (scripts\nmt\train_gnmt.py), 
IWMT2015 dataset, en-vi translation. The decoder-encoder is a 2-layer LSTM, per 
the model implemenation, as gluon.rnncell used, the MKLDNN FC can be covered as 
it is gluon.rnncell is an unfused kernel, below figure is the ppl trends 
collected on both GPU and CPU, with same hyper-parameters, the two curves 
aligned very well.
![image](https://user-images.githubusercontent.com/33112206/46126432-d4a40200-c25f-11e8-8d03-8f0cfcd9712c.png)
2. **_A simple RNN model_**, provided by official MXNET repo 
(/example/rnn/bucketing), implemented by RNN symbol API. Training tests are 
using a 3-layer LSTM/GRU RNN model with fused-RNN kernel on CPU and GPU, and 
comparses the training curves.

**Benchmarking**
Thanks to the new features released by MXNET 1.3.0 on Gluon RNN API, dummy-data 
based benchmarking are executed, using fused and unfused Gluon RNN-API 
repectively, with MXNET with MKLDNN as the backend.
The benchmarking uses a series predefined input shape, on a 1S-SKX8180 CPU, 28 
core and 192G DDR4 memory. (The input size is the embedding size, and same as 
hidden size by default) Metric is **_S_**entence **_P_**er **_S_**econd.

**1-layer LSTM fused v.s. unfused**

Input   Shape     (N, T, C, Input Size) | Fused | Unfused | Boost
-- | -- | -- | --
[64, 15, 500, 500] | 2917.237852 | 1667.527 | 174.94%
[64, 20, 500, 500] | 3661.45311 | 1196.497 | 306.01%
[64, 25, 500, 500] | 3288.546223 | 855.2861 | 384.50%
[64, 30, 500, 500] | 2913.375177 | 660.5786 | 441.03%
[64, 35, 500, 500] | 2581.44028 | 519.6848 | 496.73%
[64, 40, 500, 500] | 2479.42023 | 714.7851 | 346.88%
[64, 45, 500, 500] | 2300.442591 | 625.1124 | 368.00%
[64, 50, 500, 500] | 2160.407494 | 549.2164 | 393.36%
[16, 25, 512, 512] | 1067.593284 | 332.028 | 321.54%
[32, 25, 512, 512] | 1830.461068 | 649.8168 | 281.69%
[64, 25, 512, 512] | 2827.429465 | 1187.243 | 238.15%
[128, 25, 512, 512] | 3938.397784 | 1547.932 | 254.43%
[16, 25, 1024, 1024] | 231.900727 | 154.7335 | 149.87%
[32, 25, 1024, 1024] | 429.570455 | 298.2182 | 144.05%
[64, 25, 1024, 1024] | 744.384772 | 480.4162 | 154.95%
[128, 25, 1024, 1024] | 1204.706856 | 696.3014 | 173.02%
[16, 25, 2048, 2048] | 52.323166 | 40.81776 | 128.19%
[32, 25, 2048, 2048] | 101.108405 | 78.72398 | 128.43%
[64, 25, 2048, 2048] | 181.117374 | 131.4923 | 137.74%
[128, 25, 2048, 2048] | 315.360515 | 223.4272 | 141.15%
[16, 25, 4096, 4096] | 12.326611 | 9.575337 | 128.73%
[32, 25, 4096, 4096] | 24.255487 | 18.75816 | 129.31%
[64, 25, 4096, 4096] | 44.229753 | 34.00344 | 130.07%
[128, 25, 4096, 4096] | 78.146907 | 64.36427 | 121.41%

**1-layer GRU fused v.s. unfused**

Input   Shape     (N, T, C, Input Size) | Fused | Unfused | Boost
-- | -- | -- | --
[64, 15, 500, 500] | 3981.266 | 1714.903 | 232.16%
[64, 20, 500, 500] | 3686.065 | 1316.712 | 279.94%
[64, 25, 500, 500] | 3430.645 | 930.4283 | 368.72%
[64, 30, 500, 500] | 3130.724 | 722.1599 | 433.52%
[64, 35, 500, 500] | 2982.695 | 692.9842 | 430.41%
[64, 40, 500, 500] | 2857.4 | 621.988 | 459.40%
[64, 45, 500, 500] | 2598.724 | 533.6256 | 486.99%
[64, 50, 500, 500] | 2364.662 | 498.7772 | 474.09%
[16, 25, 512, 512] | 1066.644 | 278.212 | 383.39%
[32, 25, 512, 512] | 1861.235 | 540.8459 | 344.13%
[64, 25, 512, 512] | 3089.303 | 1020.799 | 302.64%
[128, 25, 512, 512] | 4679.54 | 1636.657 | 285.92%
[16, 25, 1024, 1024] | 317.5073 | 163.0825 | 194.69%
[32, 25, 1024, 1024] | 584.9791 | 318.4931 | 183.67%
[64, 25, 1024, 1024] | 1051.927 | 552.1558 | 190.51%
[128, 25, 1024, 1024] | 1568.747 | 814.037 | 192.71%
[16, 25, 2048, 2048] | 64.3481 | 50.81243 | 126.64%
[32, 25, 2048, 2048] | 124.1267 | 99.61789 | 124.60%
[64, 25, 2048, 2048] | 227.109 | 170.9884 | 132.82%
[128, 25, 2048, 2048] | 376.7918 | 279.1985 | 134.95%
[16, 25, 4096, 4096] | 14.59219 | 12.47552 | 116.97%
[32, 25, 4096, 4096] | 28.75226 | 24.61517 | 116.81%
[64, 25, 4096, 4096] | 52.63095 | 44.60013 | 118.01%
[128, 25, 4096, 4096] | 95.56435 | 83.10091 | 115.00%



[ Full content available at: 
https://github.com/apache/incubator-mxnet/pull/12591 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to