This test is to validate the activation calculation in mkldnn by checking the 
gradient compared to the theano.gradient.numeric_grad. However, the activation 
gradient calculation of code based on theano is not correct with the input 
closed to zero.  Thus, flaky errors occurred when there are some extremely 
small positive numbers in the random input vector. 
The experiment is as follows.

## experiment 1:

input data :[[1, 2], [3, 0.0001]]

location:
{'data':
<RowSparseNDArray 2x2 @cpu(0)>, '__random_proj':
[[0.3546685  0.8954062 ]
 [0.40476447 0.7724642 ]]
<NDArray 2x2 @cpu(0)>}

theano gradient :
[[0.35466552 0.8954048 ]
 [0.40476322 0.39395675]]

mkldnn :
[[0.3546685  0.8954062 ]
 [0.40476447 0.7724642 ]]

## experiment 2:
input data :[[1, -2], [-4, 0.0005]]

location:
{'data':
<RowSparseNDArray 2x2 @cpu(0)>, '__random_proj':
[[0.3546685  0.8954062 ]
 [0.40476447 0.7724642 ]]
<NDArray 2x2 @cpu(0)>}

theano gradient :
[[0.35466552 0.        ]
 [0.         0.4248553 ]]

mkldnn :
second argumment:[[0.3546685 0.       ]
 [0.        0.7724642]]

## analysis
It's easy to know that the derivative  of ReLU function is :
if x < 0, output is 0. if x > 0, output is 1.

Therefore, in the check_numeric_gradient function, the gradient of executor 
should be equal to location if the corresponding element of input data is 
positive and be 0 otherwise by element-wise. 
The gradient based on theano is apparently false when the corresponding element 
of input data is close to zero. 


[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12377 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to