FrozenGene commented on pull request #5485:
URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-625176840


   > @FrozenGene
   > We made the following tests to go through all the layouts,
   > 
   > Layout_1(the same in this PR)
   > input_tile = (P, CI, alpha, alpha)
   > data_pack = (alpha, alpha, P, CI)
   > bgemm = (alpha, alpha, P, CO)
   > inverse = (P, CO, m, m)
   > output = (N, H, W, CO)
   > kernel = (alpha, alpha, CI, CO)
   > 
   > Layout_2
   > input_tile = (P, CI, alpha, alpha)
   > data_pack = (alpha, alpha, P, CI)
   > bgemm = (alpha, alpha, P, CO)
   > inverse = (P, CO, m, m)
   > output = (N, H, W, CO)
   > kernel = (alpha, alpha, CO, CI)
   > 
   > Layout_3
   > input_tile = (alpha, alpha, P, CI)
   > data_pack = (alpha, alpha, P, CI)
   > bgemm = (alpha, alpha, P, CO)
   > inverse = (P, CO, m, m)
   > output = (N, H, W, CO)
   > kernel = (alpha, alpha, CI, CO)
   > 
   > Layout_4
   > input_tile = (alpha, alpha, P, CI)
   > data_pack = (alpha, alpha, P, CI)
   > bgemm = (alpha, alpha, P, CO)
   > inverse = (m, m, P, CO)
   > output = (N, H, W, CO)
   > kernel = (alpha, alpha, CO, CI)
   > 
   > The results are listed below.
   > 
   > kernel:3x3x64x64 feature maps:56x56x64:
   > 
   > batch      layout_1        layout_2        layout_3        layout_4
   > 1  0.0762  0.0766  0.0758  0.0907
   > 2  0.0911  0.0931  0.0957  0.0939
   > 4  0.1197  0.124   0.1188  0.1257
   > 8  0.1979  0.1942  0.2026  0.2208
   > 16 0.3453  0.3577  0.3427  0.3833
   > 32 0.6613  0.7161  0.6615  0.7574
   > 256        5.5837  5.3269  5.772   5.7058
   > kernel:3x3x256x256 feature maps:14x14x256:
   > 
   > batch      layout_1        layout_2        layout_3        layout_4
   > 1  0.0633  0.0694  0.0703  0.656
   > 2  0.0825  0.0835  0.0822  0.851
   > 4  0.1417  0.1562  0.1382  0.1502
   > 8  0.1829  0.1853  0.1833  0.182
   > 16 0.264   0.277   0.2832  0.2621
   > 32 0.4506  0.4799  0.4507  0.4773
   > 256        3.9432  4.0867  3.5088  4.5032
   > Note: weight transform was pre-computed. The benchmarks were running on T4 
GPU (16GB, 70W). Latency is reported with unit of ms.
   > 
   > We can see that the performance of all layouts are in the same level.
   
   Make sense to me.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to