kice opened a new issue #4523: Optimization for subpixel layer on Tensor core
URL: https://github.com/apache/incubator-tvm/issues/4523
 
 
   I found that Depth_to_Space layer spend too much time on changing data 
layout (NHWC <-> NCHW) while using tensor core. It takes up to 25% of the run 
time to do the transpose.
   
   Is it possible to reduce this kind of unnecessary data manipulation, like 
combining reshape and/or transpose into one op. 
   
   A sample network
   ```
   scale = 4
   
   conv(3, 64),
   conv(64, scale**2),
   subpixel(scale),
   conv(64 // scale**2, 3)
   ```
   
   Then it will do
   ```
   scale = 4
   
   nchwToNhwc(), 
   conv(3, 64),
   conv(64, scale**2),
   nhwcToNchw(),
   reshape and transpose
   nchwToNhwc(), 
   conv(64 // scale**2, 3)
   nhwcToNchw(),
   ```
   I think nchwToNhwc is done automatically by CUDA, maybe we could convert the 
whole to NHWC before using tensor core will be a better choice.
   
   Or some features like this PR 
https://github.com/apache/incubator-tvm/pull/4335

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to