leezu opened a new issue #18077: Parameter fusion support in Gluon URL: https://github.com/apache/incubator-mxnet/issues/18077 ## Description It's common that the parameters declared by a Block in Gluon don't exactly match the format used by operators in the backend. Thus we have examples where some parameters are concatenated every forward pass - *RNN* https://github.com/apache/incubator-mxnet/blob/c3b0baaa27e2215eae7ed7676009ea5f4bf49013/python/mxnet/gluon/rnn/rnn_layer.py#L278 - *BERT* https://github.com/dmlc/gluon-nlp/pull/1136#discussion_r377480471 A naive approach is to refactor the respective Gluon Blocks, to declare the concatenated version of the parameter. This does not work in all cases, as we wish to initialize different parameters differently. For example, RNN biases should be initialized differently from RNN weights. The status quo, where in such cases concatenation / fusion has to happen at every forward pass is not acceptable either. Proposed solution: Introduce `Block.fuse()` and `Block.unfuse()` APIs. By default, they represent no-ops. User can overwrite `fuse` and `unfuse` to declare how to fuse the Block's parameters into a new set (or single) parameter. `fuse` is called prior to the first `forward`, after the `infer_shape`. `export` will require fused parameters. Prior to `save_parameters` or `load_parameters`, the Block is unfused.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
