Adnios commented on a change in pull request #20372:
URL: https://github.com/apache/incubator-mxnet/pull/20372#discussion_r660493302



##########
File path: src/operator/tensor/elemwise_unary_op_basic.cc
##########
@@ -161,10 +161,34 @@ The storage type of ``log_sigmoid`` output is always dense
 
 )code" ADD_FILELINE)
 .set_attr<FCompute>("FCompute<cpu>", UnaryOp::Compute<cpu, 
mshadow_op::log_sigmoid>)
-.set_attr<nnvm::FGradient>("FGradient", 
ElemwiseGradUseIn{"_backward_log_sigmoid"});

Review comment:
       Hi, @szha .
   The reason of "scalar array would trigger the problem" is:
   
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/activation.cc#L126-L140
   - If the input `x` is scalar array, `SupportMKLDNNAct(param, inputs[0])` 
will `return false`.
   - If the input `x` is vector array, `SupportMKLDNNAct(param, inputs[0])` 
will `return true`. And the function `MKLDNNActivationBackward` will make it 
work. Maybe the following code take effect.
   
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/mkldnn/mkldnn_act.cc#L266-L272
   
   There are 2 solutions to make scalar array input to work.
   1. The input of `log_sigmoid_grad` should be `y`. So we can modify the 
following code which takes `x` as its input. This is also what I am doing in 
this pr.
   
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/mshadow_op.h#L416
         ```c++
         MXNET_UNARY_MATH_OP(log_sigmoid_grad, 1.0f - math::exp(a));
         ```
   2. Since `log_sigmoid_grad` takes `x` as input, we can also change the 
following code. Now it will make `x` as input to `log_sigmoid_grad`.
   
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/nn/activation-inl.h#L207-L210
       ```c++
           case activation::kLogSigmoid:
             ActivationBackward<xpu, mshadow_op::log_sigmoid, 
mshadow_op::log_sigmoid_grad>(
                 ctx, inputs[0], inputs[2], req[0], outputs[0]);
             break;
       ```
   
   I think solution_1 is better. For `y = log_sigmoid(x)`, it calculates dx 
based on `(dy, y)` instead of `(dy, x)` which enables inplace operation during 
`y = log_simoid(x)` (i.e. y and x shares the same memory).
   
   Another problem arose when I adopted the solution_1. The gradient of 
`sym.log_sigmoid()` will be wrong. The reason of this problem is that the input 
of `_backward_log_sigmoid` is `x`. When I adopt the solution_1, the input of 
`_backward_log_sigmoid` should be `y`.  The source code of `sym.log_sigmoid()` 
is the following.
   
https://github.com/apache/incubator-mxnet/blob/835e25031f847b80277b6d11db0519723d26a80a/src/operator/tensor/elemwise_unary_op_basic.cc#L152-L167
   So, I change it to `ElemwiseGradUseOut` in reference of the source code of 
`sym.sigmoid()`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to