@szha, @DickJC123: TensorCore support for RNNs is [currently disabled](https://github.com/apache/incubator-mxnet/blob/225f71f744ac5e7bd29868b6d3ba0e4fe2527c43/src/operator/cudnn_rnn-inl.h#L47). We would really like to make use of this for training large RNN models.
Are there any plans to add support for it soon? Also: what about a flag `cudnn_use_tensorcores` (similar to cudnn flags in convolution). When `true`, it will use tensorcores even with `float32` (and handle internal downcasting or [use new](https://devblogs.nvidia.com/tensor-ops-made-easier-in-cudnn/) `cudnnSetRNNMatrixMathType(cudnnRnnDesc, CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION)`). If its `None`, it will use standard heuristics (environment variable + check for `float16`). [ Full content available at: https://github.com/apache/incubator-mxnet/issues/9543 ] This message was relayed via gitbox.apache.org for [email protected]
