echuraev commented on pull request #8636:
URL: https://github.com/apache/tvm/pull/8636#issuecomment-911900705


   @mbrookhart I wasn't able to reproduce the regression on NVidia 3070. As a 
possible solution, I can create a separate `conv2d_nhwc` schedule for OpenCL. 
But I saw that my fix for OpenCL issue with `global_work_size` also works for 
default schedule on Cuda: 
https://github.com/apache/tvm/pull/8636/files#diff-05fdfdcbc0bdf86e1df35950ae34877c2f9dbddab6a99ca630582547d4e7e0faL88-L89
   
   |                            | Results on NVidia 3090 (1024 trials per 
kernel) | Results on NVidia 3070 Mobile (512 trials per kernel) |
   
|----------------------------|-------------------------------------------------|-------------------------------------------------------|
   | Main (AutoTVM)             | 0.17 ms                                       
  | 0.4504 ms                                             |
   | Main (Ansor)               | 0.18 ms                                       
  | 0.4284 ms                                             |
   | Code from the PR (AutoTVM) | 0.26 ms                                       
  | 0.4200 ms                                             |
   | Code from the PR (Ansor)   | 0.19 ms                                       
  | 0.4200 ms                                             |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to