sammieghabra opened a new issue #18665: URL: https://github.com/apache/incubator-mxnet/issues/18665
## Description Hi MXNet team, My team wants to implement a layer in MXNet to implement a ShiftScale operator. This layer is similar to batchnorm - however, we want `moving_mean` and `moving_var` to be used instead of `data_mean` and `data_var` to compute the output of the layer. I see that batch norm has a flag `use_global_stats`, and in the [mxnet docs](http://beta.mxnet.io/r/api/mx.symbol.BatchNorm.html), it seems that setting this flag to be true would be something similar to what I'm trying to do. However, upon inspecting the [batch-norm code](https://github.com/apache/incubator-mxnet/blob/master/src/operator/nn/batch_norm.cc#L260-L271), it seems that running_mean and running_var won't be updated if that flag is set to true for training. 1. Is there a reason why from a design perspective setting this `use_global_stats` flag to be true wont update the running mean and running var? 2. We would like to support this shift scale layer during training. So what my proposal is to do is to add another flag to the `batchNorm` operator to be "use_shift_scale", which would simply replace mean and var with running mean and running var when updating the weights. Is this something that MXNet team would be ok with? 3. We also plan to train with more than one instance - will the running_mean and running_var parameters be the same across instances? Thanks Sammie ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org