FrozenGene commented on pull request #5485: URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-624109424
> @FrozenGene > > We tested the layout as you suggested, and the results are listed below. > > > > > > kernel:3x3x64x64 feature maps:56x56x64: > > batch | kernel (alpha, alpha, ci, co) | kernel (alpha, alpha, co, ci) > > 1 | 0.0762 | 0.0766 > > 2 | 0.0911 | 0.0931 > > 4 | 0.1197 | 0.124 > > 8 | 0.1979 | 0.1942 > > 16 | 0.3453 | 0.3577 > > 32 | 0.6613 | 0.7161 > > 256 | 5.5837 | 5.3269 > > > > > > kernel:3x3x256x256 feature maps:14x14x256: > > batch | kernel (alpha, alpha, ci, co) | kernel (alpha, alpha, co, ci) > > 1 | 0.0633 | 0.0694 > > 2 | 0.0825 | 0.0835 > > 4 | 0.1417 | 0.1562 > > 8 | 0.1829 | 0.1853 > > 16 | 0.264 | 0.277 > > 32 | 0.4506 | 0.4799 > > 256 | 3.9432 | 4.0867 > > > > Note: weight transform was pre-computed. The benchmarks were running on T4 GPU (16GB, 70W). Latency is reported with unit of ms. > > > > We can see that the performance of both layouts are in the same level, and the kernel with alpha, alpha, ci, co layout is a little better than alpha, alpha, co, ci layout for most of the cases. > > > > > > > > > > > > > > > > > > > > Thanks for testing on GPU. Just double check, the input tile / inverse and so on layout is been changed too? (Though I think the performance should be in the same level as your report too...) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
