DickJC123 opened a new pull request, #21104:
URL: https://github.com/apache/incubator-mxnet/pull/21104
## Description ##
PR #20635, which began using the cuDNN v8 backend API for Convolution ops,
includes the following line to avoid `test_gluon_gpu.py::test_group_conv2d_16c`
failures that began occurring coincident with the PR:
```
@with_environment('MXNET_CUDNN_DISABLED_CONV_FWD_ENGINES', '5') # eng:5
causes test failure on M60
```
This PR will remove that line by providing a different implementation of the
"convolution plan cache" introduced with PR #20635 that is compatible with
convolution engine 5. The steps of this PR will be:
1. Reenable convolution engine 5, and demonstrate a return of
`test_group_conv2d` failures, then
2. Add the upgrade to the convolution plan cache so that `test_group_conv2d`
passes even with engine 5 use
Further detail:
The cuDNN v8 backend allows one to bypass a lot of CPU processing that might
precede kernel launch by first building up and finalizing a convolution
execution plan. The plan is then executed efficiently by the call
`cudnnBackendExecute(cudnn_handle, plan, ...)`. PR #20635 introduced a cache
of plans so that autotuning does not need to be repeated for
identically-parameterized convolutions, and so that such convolutions are
handled by the same engine even if they exist multiple times in a model or are
handled by different GPU workers.
The issue that was discovered for convolution engine 5 is that it caches a
cuDNN handle provided during the plan's construction, and does not consider the
handle passed as an argument of cudnnBackendExecute(). The result is that the
engine's kernels are launched into the stream of the cached handle, and this
would be the incorrect stream if the GPU worker launching the plan is different
from the one that created the plan. Without the proper stream synchronization,
incorrect results may follow.
The contribution of this PR is to effectively include a GPU worker's cudnn
handle as part of the key used in the cache lookup. One aspect of the fix
though is that if there's a cache miss, an entry made by a different worker can
be 'cloned' with the proper handle without repeating the autotuning.
## Checklist ##
### Essentials ###
- [X] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL],
[FEATURE], [DOC], etc)
- [ ] Changes are complete (i.e. I finished coding on this PR)
- [X] All changes have test coverage
- [X] Code is well-documented
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]