FrozenGene commented on pull request #5485:
URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-624109424


   > @FrozenGene 
   > 
   > We tested the layout as you suggested, and the results are listed below. 
   > 
   > 
   > 
   > 
   > 
   > kernel:3x3x64x64 feature maps:56x56x64:
   > 
   > batch | kernel (alpha, alpha, ci, co) | kernel (alpha, alpha, co, ci)
   > 
   > 1 | 0.0762 | 0.0766
   > 
   > 2 | 0.0911 | 0.0931
   > 
   > 4 | 0.1197 | 0.124
   > 
   > 8 | 0.1979 | 0.1942
   > 
   > 16 | 0.3453 | 0.3577
   > 
   > 32 | 0.6613 | 0.7161
   > 
   > 256 | 5.5837 | 5.3269
   > 
   > 
   > 
   > 
   > 
   > kernel:3x3x256x256 feature maps:14x14x256:
   > 
   > batch | kernel (alpha, alpha, ci, co) | kernel (alpha, alpha, co, ci)
   > 
   > 1 | 0.0633 | 0.0694
   > 
   > 2 | 0.0825 | 0.0835
   > 
   > 4 | 0.1417 | 0.1562
   > 
   > 8 | 0.1829 | 0.1853
   > 
   > 16 | 0.264 | 0.277
   > 
   > 32 | 0.4506 | 0.4799
   > 
   > 256 | 3.9432 | 4.0867
   > 
   > 
   > 
   > Note: weight transform was pre-computed. The benchmarks were running on T4 
GPU (16GB, 70W). Latency is reported with unit of ms.
   > 
   > 
   > 
   > We can see that the performance of both layouts are in the same level, and 
the kernel with alpha, alpha, ci, co layout is a little better than alpha, 
alpha, co, ci layout for most of the cases. 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   > 
   
   Thanks for testing on GPU. Just double check, the input tile / inverse and 
so on layout is been changed too? (Though I think the performance should be in 
the same level as your report too...)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to