Answering your question a little more directly (if you haven't moved over to Gluon), you could set the weight decay to zero for the parameters before the BlockGrad with [`set_wd_mult`](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/optimizer.py#L330).
[ Full content available at: https://github.com/apache/incubator-mxnet/issues/12392 ] This message was relayed via gitbox.apache.org for [email protected]
