waytrue17 opened a new pull request #19614: URL: https://github.com/apache/incubator-mxnet/pull/19614
## Description ## MXNet will error out during the backward propagation if CachedOp is present in the computational graph. This blocks the custom subgraph partitioning #18823 and the dynamic shape ops optimization #18690. Currently CachedOp is used as the symbol executor (external CachedOp) to run the whole graph. It is also used as individual operator inside the graph (internal CachedOp) that combines with other operators. For backward prop, the external CachedOp should be invoked through [`CachedOp::Backward()`](https://github.com/apache/incubator-mxnet/blob/11a7903c09b07f741cf81191be77d48fa8f7f584/src/imperative/cached_op.cc#L1016) while the internal CachedOp should be invoked through [`CachedOpBackward()`](https://github.com/apache/incubator-mxnet/blob/11a7903c09b07f741cf81191be77d48fa8f7f584/src/imperative/cached_op.cc#L1130). However, currently both external and internal CachedOps go through [`CachedOp::Backward()`](https://github.com/apache/incubator-mxnet/blob/11a7903c09b07f741cf81191be77d48fa8f7f584/src/imperative/imperative_utils.cc#L91), which causes the issue. This PR introduces a flag `is_executor` to differentiate the way we invoke the two CachedOps. ## Checklist ## ### Essentials ### - [ ] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc) - [ ] Changes are complete (i.e. I finished coding on this PR) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
