kice opened a new issue #4523: Optimization for subpixel layer on Tensor core URL: https://github.com/apache/incubator-tvm/issues/4523 I found that Depth_to_Space layer spend too much time on changing data layout (NHWC <-> NCHW) while using tensor core. It takes up to 25% of the run time to do the transpose. Is it possible to reduce this kind of unnecessary data manipulation, like combining reshape and/or transpose into one op. A sample network ``` scale = 4 conv(3, 64), conv(64, scale**2), subpixel(scale), conv(64 // scale**2, 3) ``` Then it will do ``` scale = 4 nchwToNhwc(), conv(3, 64), conv(64, scale**2), nhwcToNchw(), reshape and transpose nchwToNhwc(), conv(64 // scale**2, 3) nhwcToNchw(), ``` I think nchwToNhwc is done automatically by CUDA, maybe we could convert the whole to NHWC before using tensor core will be a better choice. Or some features like this PR https://github.com/apache/incubator-tvm/pull/4335
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
