@szha: I'm happy to make a PR. One question about the behaviour: the flag `MXNET_CUDA_ALLOW_TENSOR_CORE` is set to `true` by default. Its behaviour is: - Only use TensorCores if `DType` is `float16` and `MXNET_CUDA_ALLOW_TENSOR_CORE` is `true`. Presumably, the use of TensorCores will never worsen `float16` training, so defaulting to TensorCore use seems reasonable. - But if `DType` is `float32`, then we can't assume the user wants to use TensorCores. But they do want a way of opting-in (and setting `CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION` when available).
So how about an environment variable `MXNET_CUDA_TENSOR_OP_MATH_ALLOW_CONVERSION`, which defaults to `false`, and if `true`, will let `float32` or `float64` nets use TensorCores by implicit downcasting? [ Full content available at: https://github.com/apache/incubator-mxnet/issues/9543 ] This message was relayed via gitbox.apache.org for [email protected]
