Hi Pedro, these are just helper functions, you need to check the operator. In this case, the function is the derivative as function of the *output*, which is cheaper to compute:
y = log(1 + exp(x)) => dy/dx = 1/(1 + exp(-x)) = 1 - exp(-y) If you check all sorts of other ops, the same is the case. You need to always check the code for the operator. In any case, there are quite some unit tests, that would catch this, except of course if people added functions after I did this, and have not updated the unit tests. Bye, Matthias On Wed, Nov 21, 2018 at 12:52 AM Pedro Larroy <pedro.larroy.li...@gmail.com> wrote: > I bumped into the definition of the softrelu gradient: > > > https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L170 > > Which is defined as 1- exp(-x) > > As we define the forward of the softrelu as the softplus function, > shouldn't the gradient be the logistic function? > > Is my understanding that the gradient of the softrelu should go down > to zero as Lim x -> -Inf Which is not the case with the above > definition which goes to -Inf as Lim x- > -Inf > > https://en.wikipedia.org/wiki/Rectifier_(neural_networks) > > > Pedro. > -- Matthias Seeger