Answering your question a little more directly (if you haven't moved over to 
Gluon), you could set the weight decay to zero for the parameters before the 
BlockGrad with 
[`set_wd_mult`](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L330).

[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12392 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to