leezu opened a new issue #18077: Parameter fusion support in Gluon
URL: https://github.com/apache/incubator-mxnet/issues/18077
 
 
   ## Description
   It's common that the parameters declared by a Block in Gluon don't exactly 
match the format used by operators in the backend. Thus we have examples where 
some parameters are concatenated every forward pass
   - *RNN*
     
https://github.com/apache/incubator-mxnet/blob/c3b0baaa27e2215eae7ed7676009ea5f4bf49013/python/mxnet/gluon/rnn/rnn_layer.py#L278
   - *BERT*
     https://github.com/dmlc/gluon-nlp/pull/1136#discussion_r377480471
   
   A naive approach is to refactor the respective Gluon Blocks, to declare the 
concatenated version of the parameter. This does not work in all cases, as we 
wish to initialize different parameters differently. For example, RNN biases 
should be initialized differently from RNN weights.
   
   The status quo, where in such cases concatenation / fusion has to happen at 
every forward pass is not acceptable either.
   
   Proposed solution: Introduce `Block.fuse()` and `Block.unfuse()` APIs. By 
default, they represent no-ops. User can overwrite `fuse` and `unfuse` to 
declare how to fuse the Block's parameters into a new set (or single) 
parameter. `fuse` is called prior to the first `forward`, after the 
`infer_shape`.
   `export` will require fused parameters. Prior to `save_parameters` or 
`load_parameters`, the Block is unfused.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to