comaniac opened a new pull request #6132:
URL: https://github.com/apache/incubator-tvm/pull/6132
Although we make `conv2d_cudnn.cuda` and `dense_cublas.cuda` as AutoTVM
tasks so that they can be "tuned" and compared with other implementations,
there have some issues prevent us from actually "tuning" them.
- `conv2d_cudnn.cuda`: I constantly got the following errors in
`cudnnFindConvolutionForwardAlgorithm` (on T4 GPU). Note that this function is
called when extracting tasks without issues. I guess it might be the issue of
CUDA context and threading.
```
cuDNN: Check failed: e == CUDNN_STATUS_SUCCESS (2 vs. 0) :
CUDNN_STATUS_ALLOC_FAILED
```
The solution in this PR is to create a knob so that it becomes a template
with 8 candidates. In this case, `cudnnFindConvolutionForwardAlgorithm` will
not be called during tuning and everything works well. We still set the knob
value to `-1` in the fallback config to achieve the same behavoir as now.
- `dense_cblas.cuda`: The error comes from the callback function that tries
to display the FLOPS. The reason is that `task.flops` is `FloatImm` instead of
`float`, so `float(flops)` will throw type error. This PR lets `add_flop`
function support `FloatImm` and `IntImm` types.
cc @icemelon9 @merrymercy
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]