Current documentation on dropout in RNN layers / FusedRNN operator is a bit
confusing to me:
- mx.gluon.rnn.{LSTM,RNN,GRU}'s docstring (for example:
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/rnn/rnn_layer.py#L359)
state `If non-zero, introduces a dropout layer **on the outputs** of each RNN
layer except the last layer.` and this is in line with the `unfuse()` method,
which interleaves LSTMCells with DropoutCells, but skips it on the last layer:
https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/rnn/rnn_layer.py#L144.
- However, fused forward_kernel() uses mx.symbol.RNN, which itself documents
dropout as follows: `p (float, optional, default=0) – Dropout probability,
**fraction of the input** that gets dropped out at training time.`
I assume equivalence between fused and unfused RNN layers has been tested, so
the implementation is probably fine, but one of the docstrings (probably
mx.symbol.RNN) should probably be updated for clarity.
[ Full content available at:
https://github.com/apache/incubator-mxnet/issues/12629 ]
This message was relayed via gitbox.apache.org for [email protected]