[GitHub] [incubator-tvm] FrozenGene commented on pull request #5485: [TOPI][Winograd] Optimization of Conv2d Winograd algorithm on Tensor …

GitBox Thu, 30 Apr 2020 05:07:31 -0700


FrozenGene commented on pull request #5485:
URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-621791710



   For performance, have you tried some other layouts? I have some exp on CPU. 
The more suitable layout on CPU of NHWC input is:
   
   ```
     input_tile: alpha, alpha, P, CI
     data_pack: alpha, alpha, P, CI
     bgemm: alpha, alpha, P, CO
     inverse: m, m, P, CO
     output: N H W CO
     kernel: alpha alpha CO CI
   ```
   For kernel, I design `alpha alpha CO CI`, because I want to vectorize CI. 
Maybe on GPU, alpha alpha CI CO is better.
   
   I test your layout compared the layout I mentioned, your layout on 
skylake-512 is 0.388ms, but my layout I mentioned is 0.375ms. I use 20 threads 
on workload (1, 56, 56, 64, 64). The performance could be reproduced stabilized.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-tvm] FrozenGene commented on pull request #5485: [TOPI][Winograd] Optimization of Conv2d Winograd algorithm on Tensor …

Reply via email to