larroy commented on issue #15120: [bug] fix higher grad log URL: https://github.com/apache/incubator-mxnet/pull/15120#issuecomment-499261023 > @kshitij12345 I have some question about the equation `expected_head_grad = (grad_op(x) * head_grad_grads).asnumpy()` in your test. > > My understanding from the chain rule: > > ``` > Given y =f(x) > dL/dx = dL/dy * dy/dx --> this is the first forward pass. Let dL/dy be y_grad, we get dL/dx (noted as x_grad) > > Now we rewrite the above the equation: > > input0: y_grad > input1: x > output: x_grad = y_grad * f'(x) > > Another backward pass for this would be: > dL/d y_grad = dL/d x_grad * f'(x) > dL/dx = dL/d x_grad * y_grad * f''(x) > ``` > > What is the meaning of dL/d y_grad? Are we treating y_grad as another input variable here? > > Many thanks for your clarification. > @kshitij12345 I have some question about the equation `expected_head_grad = (grad_op(x) * head_grad_grads).asnumpy()` in your test. > > My understanding from the chain rule: > > ``` > Given y =f(x) > dL/dx = dL/dy * dy/dx --> this is the first forward pass. Let dL/dy be y_grad, we get dL/dx (noted as x_grad) > > Now we rewrite the above the equation: > > input0: y_grad > input1: x > output: x_grad = y_grad * f'(x) > > Another backward pass for this would be: > dL/d y_grad = dL/d x_grad * f'(x) > dL/dx = dL/d x_grad * y_grad * f''(x) > ``` > > What is the meaning of dL/d y_grad? Are we treating y_grad as another input variable here? > > Many thanks for your clarification. I think the introduction of L (loss) is confusing here. As per back accumulation of gradients and chain rule we always have the incoming gradient (also called head gradient or output gradient). So the second backward pass should calculate:  I'm thinking that maybe the problem is that we should not reuse the head gradient from the first gradient in the second gradient. Shouldn't the two head gradients be independent variables? <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="350pt" height="274pt" viewBox="0.00 0.00 350.00 273.80"> <g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 269.7972)"> <title>G</title> <polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-269.7972 346,-269.7972 346,4 -4,4"></polygon> <g id="clust1" class="cluster"> <title>cluster_1</title> <polygon fill="#f0ffff" stroke="#f0ffff" points="8,-8 8,-257.7972 78,-257.7972 78,-8 8,-8"></polygon> <text text-anchor="middle" x="43" y="-241.1972" font-family="Times,serif" font-size="14.00" fill="#000000">Fwd</text> </g> <g id="clust2" class="cluster"> <title>cluster_2</title> <polygon fill="#e0eeee" stroke="#e0eeee" points="86,-8 86,-257.7972 178,-257.7972 178,-8 86,-8"></polygon> <text text-anchor="middle" x="132" y="-241.1972" font-family="Times,serif" font-size="14.00" fill="#000000">bwd</text> </g> <g id="clust3" class="cluster"> <title>cluster_3</title> <polygon fill="#c1cdcd" stroke="#c1cdcd" points="186,-8 186,-257.7972 334,-257.7972 334,-8 186,-8"></polygon> <text text-anchor="middle" x="260" y="-241.1972" font-family="Times,serif" font-size="14.00" fill="#000000">bwd</text> </g> <!-- x --> <g id="node1" class="node"> <title>x</title> <ellipse fill="none" stroke="#000000" cx="43" cy="-206.9972" rx="27" ry="18"></ellipse> <text text-anchor="middle" x="43" y="-202.7972" font-family="Times,serif" font-size="14.00" fill="#000000">x</text> </g> <!-- log --> <g id="node2" class="node"> <title>log</title> <polygon fill="none" stroke="#000000" points="61,-138.4986 25,-138.4986 25,-102.4986 61,-102.4986 61,-138.4986"></polygon> <text text-anchor="middle" x="43" y="-116.2986" font-family="Times,serif" font-size="14.00" fill="#000000">log</text> </g> <!-- x->log --> <g id="edge1" class="edge"> <title>x->log</title> <path fill="none" stroke="#000000" d="M43,-188.6531C43,-177.1075 43,-161.97 43,-148.8942"></path> <polygon fill="#000000" stroke="#000000" points="46.5001,-148.5196 43,-138.5197 39.5001,-148.5197 46.5001,-148.5196"></polygon> </g> <!-- log_bwd --> <g id="node5" class="node"> <title>log_bwd</title> <polygon fill="none" stroke="#000000" points="161.4972,-152.9958 96.5028,-152.9958 96.5028,-88.0014 161.4972,-88.0014 161.4972,-152.9958"></polygon> <text text-anchor="middle" x="129" y="-116.2986" font-family="Times,serif" font-size="14.00" fill="#000000">log_bwd</text> </g> <!-- x->log_bwd --> <g id="edge4" class="edge"> <title>x->log_bwd</title> <path fill="none" stroke="#000000" d="M57.9801,-191.9303C66.665,-183.195 78.051,-171.743 89.1549,-160.5747"></path> <polygon fill="#000000" stroke="#000000" points="91.8112,-162.8672 96.3798,-153.3079 86.8471,-157.9317 91.8112,-162.8672"></polygon> </g> <!-- log_bwd_bwd --> <g id="node8" class="node"> <title>log_bwd_bwd</title> <ellipse fill="none" stroke="#000000" cx="260" cy="-120.4986" rx="66.082" ry="18"></ellipse> <text text-anchor="middle" x="260" y="-116.2986" font-family="Times,serif" font-size="14.00" fill="#000000">log_bwd_bwd</text> </g> <!-- x->log_bwd_bwd --> <g id="edge7" class="edge"> <title>x->log_bwd_bwd</title> <path fill="none" stroke="#000000" d="M65.1273,-196.3801C70.6013,-193.8659 76.4799,-191.2597 82,-188.9972 125.7079,-171.0828 137.9627,-170.086 182,-152.9972 192.7894,-148.8103 204.3363,-144.1177 215.1493,-139.6286"></path> <polygon fill="#000000" stroke="#000000" points="216.5469,-142.838 224.4251,-135.7539 213.8487,-136.3788 216.5469,-142.838"></polygon> </g> <!-- y --> <g id="node3" class="node"> <title>y</title> <ellipse fill="none" stroke="#000000" cx="43" cy="-34" rx="27" ry="18"></ellipse> <text text-anchor="middle" x="43" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">y</text> </g> <!-- log->y --> <g id="edge2" class="edge"> <title>log->y</title> <path fill="none" stroke="#000000" d="M43,-102.1545C43,-90.6089 43,-75.4714 43,-62.3956"></path> <polygon fill="#000000" stroke="#000000" points="46.5001,-62.021 43,-52.0211 39.5001,-62.0211 46.5001,-62.021"></polygon> </g> <!-- ograd --> <g id="node4" class="node"> <title>ograd</title> <ellipse fill="none" stroke="#000000" cx="129" cy="-206.9972" rx="33.0469" ry="18"></ellipse> <text text-anchor="middle" x="129" y="-202.7972" font-family="Times,serif" font-size="14.00" fill="#000000">ograd</text> </g> <!-- ograd->log_bwd --> <g id="edge3" class="edge"> <title>ograd->log_bwd</title> <path fill="none" stroke="#000000" d="M129,-188.6531C129,-181.1585 129,-172.1505 129,-163.1753"></path> <polygon fill="#000000" stroke="#000000" points="132.5001,-163.1383 129,-153.1384 125.5001,-163.1384 132.5001,-163.1383"></polygon> </g> <!-- x_grad --> <g id="node6" class="node"> <title>x_grad</title> <ellipse fill="none" stroke="#000000" cx="132" cy="-34" rx="37.7044" ry="18"></ellipse> <text text-anchor="middle" x="132" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">x_grad</text> </g> <!-- log_bwd->x_grad --> <g id="edge5" class="edge"> <title>log_bwd->x_grad</title> <path fill="none" stroke="#000000" d="M130.1342,-87.7972C130.4249,-79.4144 130.7352,-70.4687 131.0175,-62.327"></path> <polygon fill="#000000" stroke="#000000" points="134.5199,-62.3175 131.3687,-52.2021 127.5241,-62.0748 134.5199,-62.3175"></polygon> </g> <!-- ograd2 --> <g id="node7" class="node"> <title>ograd2</title> <ellipse fill="none" stroke="#000000" cx="260" cy="-206.9972" rx="37.7044" ry="18"></ellipse> <text text-anchor="middle" x="260" y="-202.7972" font-family="Times,serif" font-size="14.00" fill="#000000">ograd2</text> </g> <!-- ograd2->log_bwd_bwd --> <g id="edge6" class="edge"> <title>ograd2->log_bwd_bwd</title> <path fill="none" stroke="#000000" d="M260,-188.6531C260,-177.1075 260,-161.97 260,-148.8942"></path> <polygon fill="#000000" stroke="#000000" points="263.5001,-148.5196 260,-138.5197 256.5001,-148.5197 263.5001,-148.5196"></polygon> </g> <!-- x_grad_grad --> <g id="node9" class="node"> <title>x_grad_grad</title> <ellipse fill="none" stroke="#000000" cx="260" cy="-34" rx="59.6781" ry="18"></ellipse> <text text-anchor="middle" x="260" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">x_grad_grad</text> </g> <!-- log_bwd_bwd->x_grad_grad --> <g id="edge8" class="edge"> <title>log_bwd_bwd->x_grad_grad</title> <path fill="none" stroke="#000000" d="M260,-102.1545C260,-90.6089 260,-75.4714 260,-62.3956"></path> <polygon fill="#000000" stroke="#000000" points="263.5001,-62.021 260,-52.0211 256.5001,-62.0211 263.5001,-62.021"></polygon> </g> </g> </svg>
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
