wsl-inspur commented on pull request #5485: URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-624103994
@FrozenGene We tested the layout as you suggested, and the results are listed below. kernel:3x3x64x64 feature maps:56x56x64: batch | kernel (alpha, alpha, ci, co) | kernel (alpha, alpha, co, ci) 1 | 0.0762 | 0.0766 2 | 0.0911 | 0.0931 4 | 0.1197 | 0.124 8 | 0.1979 | 0.1942 16 | 0.3453 | 0.3577 32 | 0.6613 | 0.7161 256 | 5.5837 | 5.3269 kernel:3x3x256x256 feature maps:14x14x256: batch | kernel (alpha, alpha, ci, co) | kernel (alpha, alpha, co, ci) 1 | 0.0633 | 0.0694 2 | 0.0825 | 0.0835 4 | 0.1417 | 0.1562 8 | 0.1829 | 0.1853 16 | 0.264 | 0.277 32 | 0.4506 | 0.4799 256 | 3.9432 | 4.0867 Note: weight transform was pre-computed. The benchmarks were running on T4 GPU (16GB, 70W). Latency is reported with unit of ms. We can see that the performance of both layouts are in the same level, and the kernel with alpha, alpha, ci, co layout is a little better than alpha, alpha, co, ci layout for most of the cases. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
