This test is to validate the activation calculation in mkldnn by checking the 
gradient compared to the theano.gradient.numeric_grad. However, the activation 
gradient calculation of code  referred to theano is not correct with the input 
closed to zero.  Thus, flaky errors occurred when there are some extremely 
small positive numbers in the random input vector. 
The experiment is as follows.

## experiment 1:

input data :[[1, 2], [3, 0.0001]]

location:
{'data':
<RowSparseNDArray 2x2 @cpu(0)>, '__random_proj':
[[0.3546685  0.8954062 ]
 [0.40476447 0.7724642 ]]
<NDArray 2x2 @cpu(0)>}

gradient calculation referred to theano :
[[0.35466552 0.8954048 ]
 [0.40476322 0.39395675]]

mkldnn :
[[0.3546685  0.8954062 ]
 [0.40476447 0.7724642 ]]

## experiment 2:
input data :[[1, -2], [-4, 0.0005]]

location:
{'data':
<RowSparseNDArray 2x2 @cpu(0)>, '__random_proj':
[[0.3546685  0.8954062 ]
 [0.40476447 0.7724642 ]]
<NDArray 2x2 @cpu(0)>}

gradient calculation referred to theano :
[[0.35466552 0.        ]
 [0.         0.4248553 ]]

mkldnn :
[[0.3546685 0.       ]
 [0.        0.7724642]]

## analysis
It's easy to know that the derivative  of ReLU function is :
if x < 0, output is 0. if x > 0, output is 1.

Therefore, in the check_numeric_gradient function, the gradient of executor 
should be equal to location if the corresponding element of input data is 
positive and be 0 otherwise by element-wise. 
The gradient based on theano is apparently false when the corresponding element 
of input data is close to zero. 


[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12377 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to