JCBrouwer opened a new pull request #10423:
URL: https://github.com/apache/tvm/pull/10423


   Following @masahi's advice in #10223, I've added a grouped version of 
`conv2d_transpose_nchw` to the CUDA backend.
   
   The tests pass and I've verified on a larger model which makes use of 
`group_conv2d_transpose_nchw` ops that the output is correct.
   
   I wasn't quite sure how to implement the topi schedule for the op, so I've 
done my best to merge the regular `conv2d_transpose_nchw.cuda` schedule with 
the `group_conv2d_nchw.cuda` schedule.
   
   So far it seems that the CUDNN backend `group_conv2d_transpose` is quite a 
bit faster (although I guess it's to be expected if my schedule isn't very 
efficient). I haven't tried a full auto-scheduling run without CUDNN yet.
   
   PS: the diff between the topi schedules is a little confusing. I haven't 
made any changes to the `conv2d_transpose_nchw.cuda` schedule, just factored it 
out into a helper function `_schedule_conv2d_transpose_nchw` (similar to 
`_schedule_group_conv2d_nchw` in topi/cuda/group_conv2d.py) so that it can also 
be called from `schedule_group_conv2d_transpose_nchw` when there is only 1 
group.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to