The sparse softmax is useful for the transformer model: https://www.tensorflow.org/api_docs/python/tf/sparse_softmax
[ Full content available at: https://github.com/apache/incubator-mxnet/issues/12729 ] This message was relayed via gitbox.apache.org for [email protected]
