Current documentation on dropout in RNN layers / FusedRNN operator is a bit 
confusing to me: 
- mx.gluon.rnn.{LSTM,RNN,GRU}'s docstring (for example: 
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/rnn/rnn_layer.py#L359)
 state `If non-zero, introduces a dropout layer **on the outputs** of each RNN 
layer except the last layer.` and this is in line with the `unfuse()` method, 
which interleaves LSTMCells with DropoutCells, but skips it on the last layer: 
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/rnn/rnn_layer.py#L144.
- However, fused forward_kernel() uses mx.symbol.RNN, which itself documents 
dropout as follows: `p (float, optional, default=0) – Dropout probability, 
**fraction of the input** that gets dropped out at training time.`

I assume equivalence between fused and unfused RNN layers has been tested, so 
the implementation is probably fine, but one of the docstrings (probably 
mx.symbol.RNN) should probably be updated for clarity.





[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12629 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to