masahi commented on pull request #10423: URL: https://github.com/apache/tvm/pull/10423#issuecomment-1055850455
> On a related note, from more testing on my larger model (a StyleGAN generator), I've found that it's actually faster to split up grouped conv2d and conv2d_transpose into multiple non-grouped version (except for depthwise ones). > > Not sure if that'll be the case for all models (it might also be the downstream optimization passes that make this faster), but maybe it makes sense to add an option to allow splitting of grouped convs rather than transfering them to the possibly slower true group conv ops? A good way is to register both implementations in op strategy, and let autotvm decides which one to choose based on tuning result. See for example https://github.com/apache/tvm/blob/40f881b958950c4a550d9c9c3e0f0778ec21960b/python/tvm/relay/op/strategy/cuda.py#L996-L1016 But if you are using the auto scheduler, this is not supported. You are stuck with whatever that auto scheduler finds. But it's probably better than the autotvm one that you've modified in this PR. Have you actually run auto-scheduler on your new cuda conv2d_transpose compute with groups? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
