This test is to validate the activation calculation in mkldnn by checking the
gradient compared to the theano.gradient.numeric_grad. However, the activation
gradient calculation of code referring to theano is not correct with the input
closed to zero. Thus, flaky errors occurred when there are some extremely
small positive numbers in the random input vector.
The experiment is as follows.
## experiment 1:
input data :[[1, 2], [3, 0.0001]]
location:
{'data':
<RowSparseNDArray 2x2 @cpu(0)>, '__random_proj':
[[0.3546685 0.8954062 ]
[0.40476447 0.7724642 ]]
<NDArray 2x2 @cpu(0)>}
gradient calculation referring to theano :
[[0.35466552 0.8954048 ]
[0.40476322 0.39395675]]
mkldnn :
[[0.3546685 0.8954062 ]
[0.40476447 0.7724642 ]]
## experiment 2:
input data :[[1, -2], [-4, 0.0005]]
location:
{'data':
<RowSparseNDArray 2x2 @cpu(0)>, '__random_proj':
[[0.3546685 0.8954062 ]
[0.40476447 0.7724642 ]]
<NDArray 2x2 @cpu(0)>}
gradient calculation referring to theano :
[[0.35466552 0. ]
[0. 0.4248553 ]]
mkldnn :
[[0.3546685 0. ]
[0. 0.7724642]]
## analysis
It's easy to know that the derivative of ReLU function is :
if x < 0, output is 0. if x > 0, output is 1.
Therefore, in the check_numeric_gradient function, the gradient of executor
should be equal to location if the corresponding element of input data is
positive and be 0 otherwise by element-wise.
The gradient based on theano is apparently false when the corresponding element
of input data is close to zero.
[ Full content available at:
https://github.com/apache/incubator-mxnet/issues/12377 ]
This message was relayed via gitbox.apache.org for [email protected]