larroy commented on issue #15120: [bug] fix higher grad log 
URL: https://github.com/apache/incubator-mxnet/pull/15120#issuecomment-499261023
 
 
   > @kshitij12345 I have some question about the equation `expected_head_grad 
= (grad_op(x) * head_grad_grads).asnumpy()` in your test.
   > 
   > My understanding from the chain rule:
   > 
   > ```
   > Given y =f(x)
   > dL/dx = dL/dy * dy/dx -->  this is the first forward pass. Let dL/dy be 
y_grad, we get dL/dx (noted as x_grad)
   > 
   > Now we rewrite the above the equation:
   > 
   > input0: y_grad
   > input1: x
   > output: x_grad = y_grad * f'(x)
   > 
   > Another backward pass for this would be:
   > dL/d y_grad = dL/d x_grad * f'(x)
   > dL/dx = dL/d x_grad * y_grad * f''(x)
   > ```
   > 
   > What is the meaning of dL/d y_grad? Are we treating y_grad as another 
input variable here?
   > 
   > Many thanks for your clarification.
   
   
   
   > @kshitij12345 I have some question about the equation `expected_head_grad 
= (grad_op(x) * head_grad_grads).asnumpy()` in your test.
   > 
   > My understanding from the chain rule:
   > 
   > ```
   > Given y =f(x)
   > dL/dx = dL/dy * dy/dx -->  this is the first forward pass. Let dL/dy be 
y_grad, we get dL/dx (noted as x_grad)
   > 
   > Now we rewrite the above the equation:
   > 
   > input0: y_grad
   > input1: x
   > output: x_grad = y_grad * f'(x)
   > 
   > Another backward pass for this would be:
   > dL/d y_grad = dL/d x_grad * f'(x)
   > dL/dx = dL/d x_grad * y_grad * f''(x)
   > ```
   > 
   > What is the meaning of dL/d y_grad? Are we treating y_grad as another 
input variable here?
   > 
   > Many thanks for your clarification.
   
   I think the introduction of L (loss) is confusing here. As per back 
accumulation of gradients and chain rule we always have the incoming gradient 
(also called head gradient or output gradient). So the second backward pass 
should calculate:
   
![equation](https://latex.codecogs.com/gif.download?y%20%3D%20f%28x%29%20%5Crightarrow%20%5Cfrac%7Bd%5E2y%7D%7Bdx%5E2%7D%20%5Crightarrow%20ograd%20*%20f%27%27%28x%29)
   
   I'm thinking that maybe the problem is that we should not reuse the head 
gradient from the first gradient in the second gradient. Shouldn't the two head 
gradients be independent variables?
   
   <svg xmlns="http://www.w3.org/2000/svg"; 
xmlns:xlink="http://www.w3.org/1999/xlink"; width="350pt" height="274pt" 
viewBox="0.00 0.00 350.00 273.80">
   <g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 
269.7972)">
   <title>G</title>
   <polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-269.7972 
346,-269.7972 346,4 -4,4"></polygon>
   <g id="clust1" class="cluster">
   <title>cluster_1</title>
   <polygon fill="#f0ffff" stroke="#f0ffff" points="8,-8 8,-257.7972 
78,-257.7972 78,-8 8,-8"></polygon>
   <text text-anchor="middle" x="43" y="-241.1972" font-family="Times,serif" 
font-size="14.00" fill="#000000">Fwd</text>
   </g>
   <g id="clust2" class="cluster">
   <title>cluster_2</title>
   <polygon fill="#e0eeee" stroke="#e0eeee" points="86,-8 86,-257.7972 
178,-257.7972 178,-8 86,-8"></polygon>
   <text text-anchor="middle" x="132" y="-241.1972" font-family="Times,serif" 
font-size="14.00" fill="#000000">bwd</text>
   </g>
   <g id="clust3" class="cluster">
   <title>cluster_3</title>
   <polygon fill="#c1cdcd" stroke="#c1cdcd" points="186,-8 186,-257.7972 
334,-257.7972 334,-8 186,-8"></polygon>
   <text text-anchor="middle" x="260" y="-241.1972" font-family="Times,serif" 
font-size="14.00" fill="#000000">bwd</text>
   </g>
   <!-- x -->
   <g id="node1" class="node">
   <title>x</title>
   <ellipse fill="none" stroke="#000000" cx="43" cy="-206.9972" rx="27" 
ry="18"></ellipse>
   <text text-anchor="middle" x="43" y="-202.7972" font-family="Times,serif" 
font-size="14.00" fill="#000000">x</text>
   </g>
   <!-- log -->
   <g id="node2" class="node">
   <title>log</title>
   <polygon fill="none" stroke="#000000" points="61,-138.4986 25,-138.4986 
25,-102.4986 61,-102.4986 61,-138.4986"></polygon>
   <text text-anchor="middle" x="43" y="-116.2986" font-family="Times,serif" 
font-size="14.00" fill="#000000">log</text>
   </g>
   <!-- x&#45;&gt;log -->
   <g id="edge1" class="edge">
   <title>x-&gt;log</title>
   <path fill="none" stroke="#000000" d="M43,-188.6531C43,-177.1075 43,-161.97 
43,-148.8942"></path>
   <polygon fill="#000000" stroke="#000000" points="46.5001,-148.5196 
43,-138.5197 39.5001,-148.5197 46.5001,-148.5196"></polygon>
   </g>
   <!-- log_bwd -->
   <g id="node5" class="node">
   <title>log_bwd</title>
   <polygon fill="none" stroke="#000000" points="161.4972,-152.9958 
96.5028,-152.9958 96.5028,-88.0014 161.4972,-88.0014 
161.4972,-152.9958"></polygon>
   <text text-anchor="middle" x="129" y="-116.2986" font-family="Times,serif" 
font-size="14.00" fill="#000000">log_bwd</text>
   </g>
   <!-- x&#45;&gt;log_bwd -->
   <g id="edge4" class="edge">
   <title>x-&gt;log_bwd</title>
   <path fill="none" stroke="#000000" d="M57.9801,-191.9303C66.665,-183.195 
78.051,-171.743 89.1549,-160.5747"></path>
   <polygon fill="#000000" stroke="#000000" points="91.8112,-162.8672 
96.3798,-153.3079 86.8471,-157.9317 91.8112,-162.8672"></polygon>
   </g>
   <!-- log_bwd_bwd -->
   <g id="node8" class="node">
   <title>log_bwd_bwd</title>
   <ellipse fill="none" stroke="#000000" cx="260" cy="-120.4986" rx="66.082" 
ry="18"></ellipse>
   <text text-anchor="middle" x="260" y="-116.2986" font-family="Times,serif" 
font-size="14.00" fill="#000000">log_bwd_bwd</text>
   </g>
   <!-- x&#45;&gt;log_bwd_bwd -->
   <g id="edge7" class="edge">
   <title>x-&gt;log_bwd_bwd</title>
   <path fill="none" stroke="#000000" d="M65.1273,-196.3801C70.6013,-193.8659 
76.4799,-191.2597 82,-188.9972 125.7079,-171.0828 137.9627,-170.086 
182,-152.9972 192.7894,-148.8103 204.3363,-144.1177 215.1493,-139.6286"></path>
   <polygon fill="#000000" stroke="#000000" points="216.5469,-142.838 
224.4251,-135.7539 213.8487,-136.3788 216.5469,-142.838"></polygon>
   </g>
   <!-- y -->
   <g id="node3" class="node">
   <title>y</title>
   <ellipse fill="none" stroke="#000000" cx="43" cy="-34" rx="27" 
ry="18"></ellipse>
   <text text-anchor="middle" x="43" y="-29.8" font-family="Times,serif" 
font-size="14.00" fill="#000000">y</text>
   </g>
   <!-- log&#45;&gt;y -->
   <g id="edge2" class="edge">
   <title>log-&gt;y</title>
   <path fill="none" stroke="#000000" d="M43,-102.1545C43,-90.6089 43,-75.4714 
43,-62.3956"></path>
   <polygon fill="#000000" stroke="#000000" points="46.5001,-62.021 43,-52.0211 
39.5001,-62.0211 46.5001,-62.021"></polygon>
   </g>
   <!-- ograd -->
   <g id="node4" class="node">
   <title>ograd</title>
   <ellipse fill="none" stroke="#000000" cx="129" cy="-206.9972" rx="33.0469" 
ry="18"></ellipse>
   <text text-anchor="middle" x="129" y="-202.7972" font-family="Times,serif" 
font-size="14.00" fill="#000000">ograd</text>
   </g>
   <!-- ograd&#45;&gt;log_bwd -->
   <g id="edge3" class="edge">
   <title>ograd-&gt;log_bwd</title>
   <path fill="none" stroke="#000000" d="M129,-188.6531C129,-181.1585 
129,-172.1505 129,-163.1753"></path>
   <polygon fill="#000000" stroke="#000000" points="132.5001,-163.1383 
129,-153.1384 125.5001,-163.1384 132.5001,-163.1383"></polygon>
   </g>
   <!-- x_grad -->
   <g id="node6" class="node">
   <title>x_grad</title>
   <ellipse fill="none" stroke="#000000" cx="132" cy="-34" rx="37.7044" 
ry="18"></ellipse>
   <text text-anchor="middle" x="132" y="-29.8" font-family="Times,serif" 
font-size="14.00" fill="#000000">x_grad</text>
   </g>
   <!-- log_bwd&#45;&gt;x_grad -->
   <g id="edge5" class="edge">
   <title>log_bwd-&gt;x_grad</title>
   <path fill="none" stroke="#000000" d="M130.1342,-87.7972C130.4249,-79.4144 
130.7352,-70.4687 131.0175,-62.327"></path>
   <polygon fill="#000000" stroke="#000000" points="134.5199,-62.3175 
131.3687,-52.2021 127.5241,-62.0748 134.5199,-62.3175"></polygon>
   </g>
   <!-- ograd2 -->
   <g id="node7" class="node">
   <title>ograd2</title>
   <ellipse fill="none" stroke="#000000" cx="260" cy="-206.9972" rx="37.7044" 
ry="18"></ellipse>
   <text text-anchor="middle" x="260" y="-202.7972" font-family="Times,serif" 
font-size="14.00" fill="#000000">ograd2</text>
   </g>
   <!-- ograd2&#45;&gt;log_bwd_bwd -->
   <g id="edge6" class="edge">
   <title>ograd2-&gt;log_bwd_bwd</title>
   <path fill="none" stroke="#000000" d="M260,-188.6531C260,-177.1075 
260,-161.97 260,-148.8942"></path>
   <polygon fill="#000000" stroke="#000000" points="263.5001,-148.5196 
260,-138.5197 256.5001,-148.5197 263.5001,-148.5196"></polygon>
   </g>
   <!-- x_grad_grad -->
   <g id="node9" class="node">
   <title>x_grad_grad</title>
   <ellipse fill="none" stroke="#000000" cx="260" cy="-34" rx="59.6781" 
ry="18"></ellipse>
   <text text-anchor="middle" x="260" y="-29.8" font-family="Times,serif" 
font-size="14.00" fill="#000000">x_grad_grad</text>
   </g>
   <!-- log_bwd_bwd&#45;&gt;x_grad_grad -->
   <g id="edge8" class="edge">
   <title>log_bwd_bwd-&gt;x_grad_grad</title>
   <path fill="none" stroke="#000000" d="M260,-102.1545C260,-90.6089 
260,-75.4714 260,-62.3956"></path>
   <polygon fill="#000000" stroke="#000000" points="263.5001,-62.021 
260,-52.0211 256.5001,-62.0211 263.5001,-62.021"></polygon>
   </g>
   </g>
   </svg>
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to