zixuanweeei commented on issue #18077: Parameter fusion support in Gluon URL: https://github.com/apache/incubator-mxnet/issues/18077#issuecomment-616012849 > What about optimizer? Would optimizers see fused or unfused params? From the view of RNN operator, I think the optimizers will see the fused parameters. In both forward and backward scenario, only the fused parameter exists in the arguments dict. And we can use `Block.unfuse()` to overwrite the values of unfused parameters. Both `RNN` and `_backward_RNN` operator receive a NDArray holder for fused parameter: + RNN (line 412) https://github.com/apache/incubator-mxnet/blob/dcada9b9c145d2e93d51790d234f0a2ddc7091df/src/operator/rnn.cc#L411-L416 + _backward_RNN (line 213) https://github.com/apache/incubator-mxnet/blob/dcada9b9c145d2e93d51790d234f0a2ddc7091df/src/operator/rnn.cc#L207-L214 If a model use [`mx.rnn.FusedRNNCell`](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/rnn/rnn_cell.py#L535), the optimizer will apply the gradients to the fused parameter directly. But it's not true with rnn layer [`mx.gluon.rnn.***`](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/rnn/rnn_layer.py#L33). It always has a `_rnn_param_concat` operator prior to the fused parameter. So the optimizer or Backward pass just deliver the fused gradients to the unfused parameters individually. There are several memcpy operations behind this. But after all, when recording the gradients w.r.t. specific unfused parameter, the problem does arise. Anyway, it's a very helpful feature for the forward pass regarding the performance.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
