yifeim opened a new issue #10494: stop_gradient fails under adam and wd
URL: https://github.com/apache/incubator-mxnet/issues/10494
 
 
   ## Description
   mx.sym.stop_gradient fails to protect the parameters when using adam 
optimizer with wd>0. It works otherwise (with sgd or wd=0).
   
   ## Environment info (Required)
   
   ----------Python Info----------
   Version      : 3.6.4
   Compiler     : GCC 7.2.0
   Build        : ('default', 'Jan 16 2018 18:10:19')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 9.0.1
   Directory    : 
/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.0.0
   Directory    : 
/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet
   Commit Hash   : 9ef196909ec7bf9cdda66d5b97c92793109798e1
   ----------System Info----------
   Platform     : Linux-4.4.0-1054-aws-x86_64-with-debian-stretch-sid
   system       : Linux
   node         : ip-172-31-0-77
   release      : 4.4.0-1054-aws
   version      : #63-Ubuntu SMP Wed Mar 28 19:42:42 UTC 2018
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0062 
sec, LOAD: 0.5940 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1194 sec, LOAD: 
0.0632 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0849 sec, LOAD: 
0.1816 sec.
   Timing for FashionMNIST: 
https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz,
 DNS: 0.0444 sec, LOAD: 0.1512 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0023 sec, LOAD: 
0.7215 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0093 sec, 
LOAD: 0.0760 sec.
   
   Package used (Python/R/Scala/Julia):
   I'm using Python.
   
   ## Error Message:
   No error message; the output is unexpected.
   
   ## Minimum reproducible example
   
   ```
   import mxnet as mx
   mx.random.seed(0)
   num_classes=2
   
   # data
   data_iter = mx.io.NDArrayIter(
       mx.nd.random.uniform(shape=(10, 5)),
       mx.nd.floor(mx.nd.random.uniform(shape=(10,))*num_classes),
       10
   )
   
   # symbol and module
   data          = mx.sym.var('data')
   softmax_label = mx.sym.var('softmax_label')
   loss = mx.sym.SoftmaxOutput(
       mx.sym.stop_gradient(
           mx.sym.FullyConnected(data, num_hidden=num_classes, name='fc')
       ),
       softmax_label
   )
   mod = mx.mod.Module(loss)
   mod.bind(data_iter.provide_data, data_iter.provide_label)
   mod.init_params()
   
   # initial
   print(dict(mod.get_params()[0])['fc_weight'])
   ## [[-0.00470889 -0.00627335  0.00548467  0.00473836 -0.00087699]
   ## [-0.00566899  0.00136868 -0.00729564 -0.0096242  -0.00351718]]
   
   # okay (stop_gradient) for default training
   mod.fit(data_iter, num_epoch=1000)
   print(dict(mod.get_params()[0])['fc_weight'])
   ## [[-0.00470889 -0.00627335  0.00548467  0.00473836 -0.00087699]
   ## [-0.00566899  0.00136868 -0.00729564 -0.0096242  -0.00351718]]
   
   # not okay with adam and wd
   mod.init_optimizer(optimizer='adam', optimizer_params={'wd':1e-10}, 
force_init=True)
   mod.fit(data_iter, num_epoch=1000)
   print(dict(mod.get_params()[0])['fc_weight'])
   ## [[-0.00468163 -0.00623705  0.00545293  0.00471094 -0.00087192]
   ## [-0.00563618  0.00136076 -0.00725341 -0.00956851 -0.00349682]]
   ```
   
   ## What have you tried to solve it?
   
   1. Avoid using adam and wd together.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to