[GitHub] [tvm] JCBrouwer edited a comment on pull request #10423: Add group_conv2d_transpose_nchw to CUDA backend

GitBox Tue, 01 Mar 2022 05:22:29 -0800


JCBrouwer edited a comment on pull request #10423:
URL: https://github.com/apache/tvm/pull/10423#issuecomment-1055433768



   On a related note, from more testing on my larger model (a StyleGAN 
generator), I've found that it's actually faster to split up grouped conv2d and 
conv2d_transpose into multiple non-grouped version (except for depthwise ones).
   
   Not sure if that'll be the case for all models (it might also be the 
downstream optimization passes that make this faster), but maybe it makes sense 
to add an option to allow splitting of grouped convs rather than transfering 
them to the possibly slower true group conv ops?
   
   ```
   name                                               time ms    fps
   G_tvm_target=cuda_split=none                       44886      0.0223    
   G_tvm_target=cuda_split=transpose                  17853      0.0560    
   G_tvm_target=cuda_split=both                       16114      0.6206    
   G_tvm_target=cuda,cutlass_split=none               44887      0.0223    
   G_tvm_target=cuda,cutlass_split=transpose          17852      0.0560    
   G_tvm_target=cuda,cutlass_split=both               16116      0.6205    
   G_tvm_target=cuda,cudnn_split=none                 59         16.8368   
   G_tvm_target=cuda,cudnn_split=transpose            61         16.3961   
   G_tvm_target=cuda,cudnn_split=both                 49         20.4347   
   ```
   
   (N.B. I think I'm compiling the CUTLASS target wrong, because the results 
are suspiciously close to the regular CUDA target)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] JCBrouwer edited a comment on pull request #10423: Add group_conv2d_transpose_nchw to CUDA backend

Reply via email to