wkcn opened a new issue #9511: set_lr_mult() or set_wd_mult() is invalid if not setting param_idx2name for the optimizer URL: https://github.com/apache/incubator-mxnet/issues/9511 ## Description Hi, all. I found a problem that I have to set **param_idx2name** for the optimizer if I want to **set lr_mult or wd_mult**. **If not setting param_idx2name, the set_lr_mult() and set_wd_mult() are both invalid, but there is no any prompt**. However, **it's difficult to define param_idx2name because kvstore and multi-GPU**. ## Environment info (Required) Operation System: Arch Linux 4.14.13 MXNet: [20fbda6](https://github.com/apache/incubator-mxnet/commit/20fbda6c9d15ba903fc6416baa7eecf79ab38f1b) Python: 2.7.14/3.6.4 ## Build info (Required if built from source) Compiler (gcc/clang/mingw/visual studio): gcc MXNet commit hash: 20fbda6c9d15ba903fc6416baa7eecf79ab38f1b Build config: ``` make -j 4 USE_OPENCV=1 USE_BLAS=openblas ``` ## Minimum reproducible example ```python import mxnet as mx import logging logging.getLogger().setLevel(logging.DEBUG) # logging to stdout mnist = mx.test_utils.get_mnist() batch_size = 100 train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], batch_size, shuffle=False) val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], batch_size) data = mx.sym.var("data") data = mx.sym.flatten(data = data) fc1 = mx.sym.FullyConnected(data = data, num_hidden = 128) act1 = mx.sym.Activation(data = fc1, act_type = "relu") fc2 = mx.sym.FullyConnected(data = act1, num_hidden = 64) act2 = mx.sym.Activation(data = fc2, act_type = "relu") act2 = mx.sym.BatchNorm(data = act2) fc3 = mx.sym.FullyConnected(data = act2, num_hidden = 10) mlp = mx.sym.SoftmaxOutput(data = fc3, name = "softmax") mlp_model = mx.mod.Module(symbol = mlp, context = mx.cpu()) lr = 0.01 params = (mlp.list_arguments()) lr_mult = dict() wd_mult = dict() idx2name = dict() for idx, name in enumerate(params): lr_mult[name] = 0 idx2name[idx] = name optimizer = mx.optimizer.SGD(learning_rate = lr, momentum = 0.9, wd = 0.0005, rescale_grad = 1.0 / batch_size) optimizer.set_lr_mult(lr_mult) optimizer.set_wd_mult(wd_mult) mlp_model.fit(train_iter, eval_data = val_iter, optimizer = optimizer, eval_metric = [mx.metric.Accuracy(), mx.metric.CrossEntropy()], batch_end_callback = mx.callback.Speedometer(batch_size, 100), num_epoch = 20) ``` ## Steps to reproduce 1. I set lr_mult to 0 but didn't set param_idx2name for the optimizer. The result is wrong and the weights of the network shouldn't be updated because lr_mult is 0. ``` INFO:root:Epoch[0] Batch [100] Speed: 5695.61 samples/sec accuracy=0.531386 cross-entropy=1.513995 INFO:root:Epoch[0] Batch [200] Speed: 6095.63 samples/sec accuracy=0.877100 cross-entropy=0.442159 INFO:root:Epoch[0] Batch [300] Speed: 5751.52 samples/sec accuracy=0.921100 cross-entropy=0.281648 INFO:root:Epoch[0] Batch [400] Speed: 6200.54 samples/sec accuracy=0.933200 cross-entropy=0.231324 INFO:root:Epoch[0] Batch [500] Speed: 5996.19 samples/sec accuracy=0.937900 cross-entropy=0.210167 INFO:root:Epoch[0] Train-accuracy=0.955152 INFO:root:Epoch[0] Train-cross-entropy=0.149803 INFO:root:Epoch[0] Time cost=10.007 INFO:root:Epoch[0] Validation-accuracy=0.950700 INFO:root:Epoch[0] Validation-cross-entropy=0.161047 INFO:root:Epoch[1] Batch [100] Speed: 6367.74 samples/sec accuracy=0.955644 cross-entropy=0.147375 INFO:root:Epoch[1] Batch [200] Speed: 5722.35 samples/sec accuracy=0.961800 cross-entropy=0.133875 INFO:root:Epoch[1] Batch [300] Speed: 5332.16 samples/sec accuracy=0.965100 cross-entropy=0.116933 INFO:root:Epoch[1] Batch [400] Speed: 5303.59 samples/sec accuracy=0.966900 cross-entropy=0.117010 INFO:root:Epoch[1] Batch [500] Speed: 5561.86 samples/sec accuracy=0.964600 cross-entropy=0.121509 ``` ## What have you tried to solve it? There is a solution to set param_idx2name manually for the optimizer, however it's difficult to set it especially for the case using multi-gpu. The [PR](https://github.com/apache/incubator-mxnet/pull/2337/commits/a77d47d5ec93512a3750c82004122cbbc0cab8a2) shows that the definition of param_idx2name. It seems that **whether or not use kvstore or multi-gpu decides different setting of param_idx2name**. So I think it's convenient to **set param_idx2name automatically** when the optimizer is initialized in mxnet.module.BaseModule Here is [the code](https://github.com/wkcn/incubator-mxnet/commit/4e89621c37490bdd03a599d5aa1bf49976fddb2d) I modified. And the result is right when setting lr_mult 0 and not setting param_idx2name. ``` INFO:root:Epoch[0] Batch [100] Speed: 5697.48 samples/sec accuracy=0.079604 cross-entropy=2.302685 INFO:root:Epoch[0] Batch [200] Speed: 6142.33 samples/sec accuracy=0.080000 cross-entropy=2.302679 INFO:root:Epoch[0] Batch [300] Speed: 5620.36 samples/sec accuracy=0.082400 cross-entropy=2.302705 INFO:root:Epoch[0] Batch [400] Speed: 5679.43 samples/sec accuracy=0.084000 cross-entropy=2.302689 INFO:root:Epoch[0] Batch [500] Speed: 6029.99 samples/sec accuracy=0.079000 cross-entropy=2.302701 INFO:root:Epoch[0] Train-accuracy=0.078586 INFO:root:Epoch[0] Train-cross-entropy=2.302687 INFO:root:Epoch[0] Time cost=11.746 INFO:root:Epoch[0] Validation-accuracy=0.079100 INFO:root:Epoch[0] Validation-cross-entropy=2.302701 INFO:root:Epoch[1] Batch [100] Speed: 2341.08 samples/sec accuracy=0.079604 cross-entropy=2.302685 INFO:root:Epoch[1] Batch [200] Speed: 3169.10 samples/sec accuracy=0.080000 cross-entropy=2.302679 INFO:root:Epoch[1] Batch [300] Speed: 5883.45 samples/sec accuracy=0.082400 cross-entropy=2.302705 INFO:root:Epoch[1] Batch [400] Speed: 5527.54 samples/sec accuracy=0.084000 cross-entropy=2.302689 INFO:root:Epoch[1] Batch [500] Speed: 5744.79 samples/sec accuracy=0.079000 cross-entropy=2.302701 ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
