JCBrouwer opened a new pull request #10423: URL: https://github.com/apache/tvm/pull/10423
Following @masahi's advice in #10223, I've added a grouped version of `conv2d_transpose_nchw` to the CUDA backend. The tests pass and I've verified on a larger model which makes use of `group_conv2d_transpose_nchw` ops that the output is correct. I wasn't quite sure how to implement the topi schedule for the op, so I've done my best to merge the regular `conv2d_transpose_nchw.cuda` schedule with the `group_conv2d_nchw.cuda` schedule. So far it seems that the CUDNN backend `group_conv2d_transpose` is quite a bit faster (although I guess it's to be expected if my schedule isn't very efficient). I haven't tried a full auto-scheduling run without CUDNN yet. PS: the diff between the topi schedules is a little confusing. I haven't made any changes to the `conv2d_transpose_nchw.cuda` schedule, just factored it out into a helper function `_schedule_conv2d_transpose_nchw` (similar to `_schedule_group_conv2d_nchw` in topi/cuda/group_conv2d.py) so that it can also be called from `schedule_group_conv2d_transpose_nchw` when there is only 1 group. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
