wsl-inspur edited a comment on pull request #5485:
URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-625091214


   @FrozenGene 
   We made the following tests to go through all the layouts, 
   
   Layout_1(the same in this PR)
   input_tile = (P, CI, alpha, alpha)
   data_pack = (alpha, alpha, P, CI)
   bgemm = (alpha, alpha, P, CO)
   inverse = (P, CO, m, m)
   output = (N, H, W, CO)
   kernel = (alpha, alpha, CI, CO)
   
   Layout_2
   input_tile = (P, CI, alpha, alpha)
   data_pack = (alpha, alpha, P, CI)
   bgemm = (alpha, alpha, P, CO)
   inverse = (P, CO, m, m)
   output = (N, H, W, CO)
   kernel = (alpha, alpha, CO, CI)
   
   Layout_3
   input_tile = (alpha, alpha, P, CI)
   data_pack = (alpha, alpha, P, CI)
   bgemm = (alpha, alpha, P, CO)
   inverse = (P, CO, m, m)
   output = (N, H, W, CO)
   kernel = (alpha, alpha, CI, CO)
   
   Layout_4
   input_tile = (alpha, alpha, P, CI)
   data_pack = (alpha, alpha, P, CI)
   bgemm = (alpha, alpha, P, CO)
   inverse = (m, m, P, CO)
   output = (N, H, W, CO)
   kernel = (alpha, alpha, CO, CI)
   
   The results are listed below.
   
   kernel:3x3x64x64 feature maps:56x56x64:
   batch | layout_1 | layout_2 | layout_3 | layout_4
   -- | -- | -- | -- | --
   1 | 0.0762 | 0.0766 | 0.0758 | 0.0907
   2 | 0.0911 | 0.0931 | 0.0957 | 0.0939
   4 | 0.1197 | 0.124 | 0.1188 | 0.1257
   8 | 0.1979 | 0.1942 | 0.2026 | 0.2208
   16 | 0.3453 | 0.3577 | 0.3427 | 0.3833
   32 | 0.6613 | 0.7161 | 0.6615 | 0.7574
   256 | 5.5837 | 5.3269 | 5.772 | 5.7058
   
   kernel:3x3x256x256 feature maps:14x14x256:
   
   batch | layout_1 | layout_2 | layout_3 | layout_4
   -- | -- | -- | -- | --
   1 | 0.0633 | 0.0694 | 0.0703 | 0.656
   2 | 0.0825 | 0.0835 | 0.0822 | 0.851
   4 | 0.1417 | 0.1562 | 0.1382 | 0.1502
   8 | 0.1829 | 0.1853 | 0.1833 | 0.182
   16 | 0.264 | 0.277 | 0.2832 | 0.2621
   32 | 0.4506 | 0.4799 | 0.4507 | 0.4773
   256 | 3.9432 | 4.0867 | 3.5088 | 4.5032
   
   
   
   
   
   Note: weight transform was pre-computed. The benchmarks were running on T4 
GPU (16GB, 70W). Latency is reported with unit of ms.
   
   We can see that the performance of all layouts are in the same level. 
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to