I have an explanation but I'll have to think about the best fix. The problem starts with the fact that cudnnFind() does its own workspace allocations and doesn't use MXNet's memory allocator. MXNet anticipates this by setting up a 'headroom' via MXNET_GPU_MEM_POOL_RESERVE (a percentage of total memory). I was able to run your script with repeated allocations on a 16GB GPU by setting MXNET_GPU_MEM_POOL_RESERVE=35. On a 12GB GPU, the corresponding value would be 47!! That's clearly excessive so we might have to resort to calling the 'Ex' flavor of cudnnFind, which allows for pre-screening of algos that have a workspace greater than the threshold set by the convolution instance 'workspace' param.
[ Full content available at: https://github.com/apache/incubator-mxnet/issues/12662 ] This message was relayed via gitbox.apache.org for [email protected]
