JCBrouwer edited a comment on pull request #10423: URL: https://github.com/apache/tvm/pull/10423#issuecomment-1055433768
On a related note, from more testing on my larger model (a StyleGAN generator), I've found that it's actually faster to split up grouped conv2d and conv2d_transpose into multiple non-grouped version (except for depthwise ones). Not sure if that'll be the case for all models (it might also be the downstream optimization passes that make this faster), but maybe it makes sense to add an option to allow splitting of grouped convs rather than transfering them to the possibly slower true group conv ops? ``` name time ms fps G_tvm_target=cuda_split=none 44886 0.0223 G_tvm_target=cuda_split=transpose 17853 0.0560 G_tvm_target=cuda_split=both 16114 0.6206 G_tvm_target=cuda,cutlass_split=none 44887 0.0223 G_tvm_target=cuda,cutlass_split=transpose 17852 0.0560 G_tvm_target=cuda,cutlass_split=both 16116 0.6205 G_tvm_target=cuda,cudnn_split=none 59 16.8368 G_tvm_target=cuda,cudnn_split=transpose 61 16.3961 G_tvm_target=cuda,cudnn_split=both 49 20.4347 ``` (N.B. I think I'm compiling the CUTLASS target wrong, because the results are suspiciously close to the regular CUDA target) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
