wkcn opened a new issue #9511: set_lr_mult() or set_wd_mult() is invalid if not 
setting param_idx2name for the optimizer
URL: https://github.com/apache/incubator-mxnet/issues/9511
 
 
   ## Description
   Hi, all.
   
   I found a problem that I have to set **param_idx2name** for the optimizer if 
I want to **set lr_mult or wd_mult**.
    
   **If not setting param_idx2name, the set_lr_mult() and set_wd_mult() are 
both invalid, but there is no any prompt**.
   
   However, **it's difficult to define param_idx2name because kvstore and 
multi-GPU**.
   
   ## Environment info (Required)
   Operation System: Arch Linux 4.14.13
   MXNet: 
[20fbda6](https://github.com/apache/incubator-mxnet/commit/20fbda6c9d15ba903fc6416baa7eecf79ab38f1b)
   Python: 2.7.14/3.6.4
   
   
   ## Build info (Required if built from source)
   
   Compiler (gcc/clang/mingw/visual studio): gcc
   
   MXNet commit hash:
   20fbda6c9d15ba903fc6416baa7eecf79ab38f1b
   
   Build config:
   ```
   make -j 4 USE_OPENCV=1 USE_BLAS=openblas
   ```
   
   ## Minimum reproducible example
   ```python
   import mxnet as mx
   import logging
   logging.getLogger().setLevel(logging.DEBUG)  # logging to stdout
   
   mnist = mx.test_utils.get_mnist()
   
   batch_size = 100
   train_iter = mx.io.NDArrayIter(mnist['train_data'], mnist['train_label'], 
batch_size, shuffle=False)
   val_iter = mx.io.NDArrayIter(mnist['test_data'], mnist['test_label'], 
batch_size)
   
   data = mx.sym.var("data")
   data = mx.sym.flatten(data = data)
   
   fc1 = mx.sym.FullyConnected(data = data, num_hidden = 128)
   act1 = mx.sym.Activation(data = fc1, act_type = "relu")
   fc2 = mx.sym.FullyConnected(data = act1, num_hidden = 64)
   act2 = mx.sym.Activation(data = fc2, act_type = "relu")
   act2 = mx.sym.BatchNorm(data = act2)
   
   fc3 = mx.sym.FullyConnected(data = act2, num_hidden = 10)
   mlp = mx.sym.SoftmaxOutput(data = fc3, name = "softmax")
   
   mlp_model = mx.mod.Module(symbol = mlp, context = mx.cpu())
   
   lr = 0.01
   
   params =  (mlp.list_arguments())
   lr_mult = dict()
   wd_mult = dict()
   idx2name = dict()
   for idx, name in enumerate(params):
       lr_mult[name] = 0
       idx2name[idx] = name
   
   optimizer = mx.optimizer.SGD(learning_rate = lr, 
           momentum = 0.9, 
           wd = 0.0005, 
           rescale_grad = 1.0 / batch_size)
   
   optimizer.set_lr_mult(lr_mult)
   optimizer.set_wd_mult(wd_mult)
   
   mlp_model.fit(train_iter,
                 eval_data = val_iter,
                 optimizer = optimizer,
                 eval_metric = [mx.metric.Accuracy(), mx.metric.CrossEntropy()],
                 batch_end_callback = mx.callback.Speedometer(batch_size, 100),
                 num_epoch = 20)
   ```
   
   ## Steps to reproduce
   
   1. I set lr_mult to 0 but didn't set param_idx2name for the optimizer. The 
result is wrong and the weights of the network shouldn't be updated because 
lr_mult is 0.
   ```
   INFO:root:Epoch[0] Batch [100]       Speed: 5695.61 samples/sec      
accuracy=0.531386       cross-entropy=1.513995
   INFO:root:Epoch[0] Batch [200]       Speed: 6095.63 samples/sec      
accuracy=0.877100       cross-entropy=0.442159
   INFO:root:Epoch[0] Batch [300]       Speed: 5751.52 samples/sec      
accuracy=0.921100       cross-entropy=0.281648
   INFO:root:Epoch[0] Batch [400]       Speed: 6200.54 samples/sec      
accuracy=0.933200       cross-entropy=0.231324
   INFO:root:Epoch[0] Batch [500]       Speed: 5996.19 samples/sec      
accuracy=0.937900       cross-entropy=0.210167
   INFO:root:Epoch[0] Train-accuracy=0.955152
   INFO:root:Epoch[0] Train-cross-entropy=0.149803
   INFO:root:Epoch[0] Time cost=10.007
   INFO:root:Epoch[0] Validation-accuracy=0.950700
   INFO:root:Epoch[0] Validation-cross-entropy=0.161047
   INFO:root:Epoch[1] Batch [100]       Speed: 6367.74 samples/sec      
accuracy=0.955644       cross-entropy=0.147375
   INFO:root:Epoch[1] Batch [200]       Speed: 5722.35 samples/sec      
accuracy=0.961800       cross-entropy=0.133875
   INFO:root:Epoch[1] Batch [300]       Speed: 5332.16 samples/sec      
accuracy=0.965100       cross-entropy=0.116933
   INFO:root:Epoch[1] Batch [400]       Speed: 5303.59 samples/sec      
accuracy=0.966900       cross-entropy=0.117010
   INFO:root:Epoch[1] Batch [500]       Speed: 5561.86 samples/sec      
accuracy=0.964600       cross-entropy=0.121509
   ```
   
   ## What have you tried to solve it?
   
   
   There is a solution to set param_idx2name manually for the optimizer, 
however it's difficult to set it especially for the case using multi-gpu.
   
   The 
[PR](https://github.com/apache/incubator-mxnet/pull/2337/commits/a77d47d5ec93512a3750c82004122cbbc0cab8a2)
 shows that the definition of param_idx2name.
   
   It seems that **whether or not use kvstore or multi-gpu decides different 
setting of param_idx2name**.
   
   So I think it's convenient to **set param_idx2name automatically** when the 
optimizer is initialized in mxnet.module.BaseModule
   
   Here is [the 
code](https://github.com/wkcn/incubator-mxnet/commit/4e89621c37490bdd03a599d5aa1bf49976fddb2d)
 I modified.
   
   And the result is right when setting lr_mult 0 and not setting 
param_idx2name.
   ```
   INFO:root:Epoch[0] Batch [100]       Speed: 5697.48 samples/sec      
accuracy=0.079604       cross-entropy=2.302685
   INFO:root:Epoch[0] Batch [200]       Speed: 6142.33 samples/sec      
accuracy=0.080000       cross-entropy=2.302679
   INFO:root:Epoch[0] Batch [300]       Speed: 5620.36 samples/sec      
accuracy=0.082400       cross-entropy=2.302705
   INFO:root:Epoch[0] Batch [400]       Speed: 5679.43 samples/sec      
accuracy=0.084000       cross-entropy=2.302689
   INFO:root:Epoch[0] Batch [500]       Speed: 6029.99 samples/sec      
accuracy=0.079000       cross-entropy=2.302701
   INFO:root:Epoch[0] Train-accuracy=0.078586
   INFO:root:Epoch[0] Train-cross-entropy=2.302687
   INFO:root:Epoch[0] Time cost=11.746
   INFO:root:Epoch[0] Validation-accuracy=0.079100
   INFO:root:Epoch[0] Validation-cross-entropy=2.302701
   INFO:root:Epoch[1] Batch [100]       Speed: 2341.08 samples/sec      
accuracy=0.079604       cross-entropy=2.302685
   INFO:root:Epoch[1] Batch [200]       Speed: 3169.10 samples/sec      
accuracy=0.080000       cross-entropy=2.302679
   INFO:root:Epoch[1] Batch [300]       Speed: 5883.45 samples/sec      
accuracy=0.082400       cross-entropy=2.302705
   INFO:root:Epoch[1] Batch [400]       Speed: 5527.54 samples/sec      
accuracy=0.084000       cross-entropy=2.302689
   INFO:root:Epoch[1] Batch [500]       Speed: 5744.79 samples/sec      
accuracy=0.079000       cross-entropy=2.302701
   ```
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to