comaniac opened a new pull request #6132:
URL: https://github.com/apache/incubator-tvm/pull/6132


   Although we make `conv2d_cudnn.cuda` and `dense_cublas.cuda` as AutoTVM 
tasks so that they can be "tuned" and compared with other implementations, 
there have some issues prevent us from actually "tuning" them.
   
   - `conv2d_cudnn.cuda`: I constantly got the following errors in 
`cudnnFindConvolutionForwardAlgorithm` (on T4 GPU). Note that this function is 
called when extracting tasks without issues. I guess it might be the issue of 
CUDA context and threading.
   
       ```
       cuDNN: Check failed: e == CUDNN_STATUS_SUCCESS (2 vs. 0) : 
CUDNN_STATUS_ALLOC_FAILED
       ```
   
      The solution in this PR is to create a knob so that it becomes a template 
with 8 candidates. In this case, `cudnnFindConvolutionForwardAlgorithm` will 
not be called during tuning and everything works well. We still set the knob 
value to `-1` in the fallback config to achieve the same behavoir as now.
   
   - `dense_cblas.cuda`: The error comes from the callback function that tries 
to display the FLOPS. The reason is that `task.flops` is `FloatImm` instead of 
`float`, so `float(flops)` will throw type error. This PR lets `add_flop` 
function support `FloatImm` and `IntImm` types.
   
   cc @icemelon9 @merrymercy 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to