masahi commented on pull request #10423:
URL: https://github.com/apache/tvm/pull/10423#issuecomment-1055850455


   > On a related note, from more testing on my larger model (a StyleGAN 
generator), I've found that it's actually faster to split up grouped conv2d and 
conv2d_transpose into multiple non-grouped version (except for depthwise ones).
   > 
   > Not sure if that'll be the case for all models (it might also be the 
downstream optimization passes that make this faster), but maybe it makes sense 
to add an option to allow splitting of grouped convs rather than transfering 
them to the possibly slower true group conv ops?
   
   A good way is to register both implementations in op strategy, and let 
autotvm decides which one to choose based on tuning result. See for example 
https://github.com/apache/tvm/blob/40f881b958950c4a550d9c9c3e0f0778ec21960b/python/tvm/relay/op/strategy/cuda.py#L996-L1016
   
   But if you are using the auto scheduler, this is not supported. You are 
stuck with whatever that auto scheduler finds. But it's probably better than 
the autotvm one that you've modified in this PR. Have you actually run 
auto-scheduler on your new cuda conv2d_transpose compute with groups?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to