This epsilon would have an impact on the baseline calculation instead of mkldnn calculation. And the smaller the epsilon is, the more accurate the baseline (gradient referring to theano) is. So this change won't make it worse.
[ Full content available at: https://github.com/apache/incubator-mxnet/pull/12418 ] This message was relayed via gitbox.apache.org for [email protected]
