t-vi commented on pull request #5959: URL: https://github.com/apache/incubator-tvm/pull/5959#issuecomment-651298087
This pull request is only about the second commit, the first is #5946 . I noticed that my gradient had many more O^3 (matmul etc.) operations than it should have and tracked this down to how gradients are computed when a value is used several times in the computation. Graphs are becoming really big and unwieldy if they are not purely sequential computation. Also, the duplication cannot be eliminated by CSE because the "output part" is duplicate rather than the input (one could, in theory commute add with all the gradient ops). While it doesn't fix anything, it might also have a mitigating impact for people seeing other effects when working with first order gradients (e.g. #4534). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
