access2rohit edited a comment on issue #17960:
URL:
https://github.com/apache/incubator-mxnet/issues/17960#issuecomment-621444309
@szhengac I changed the line in run_pretraining.py
```
trainer = mx.gluon.Trainer(model.collect_params(), 'bertadam',
optim_params,
update_on_kvstore=False, kvstore=store)
```
to
```
trainer = mx.gluon.Trainer(model.collect_params(), 'lamb', optim_params,
update_on_kvstore=False, kvstore=store)
```
I still the the follwoing error:
```
/home/ubuntu/workspace/incubator-mxnet/python/mxnet/optimizer/optimizer.py:163:
UserWarning: WARNING: New optimizer gluonnlp.optimizer.lamb.LAMB is overriding
existing optimizer mxnet.optimizer.lamb.LAMB
Optimizer.opt_registry[name].__name__))
INFO:root:Namespace(accumulate=4, batch_size=8, batch_size_eval=8,
ckpt_dir='./ckpt_dir', ckpt_interval=25000,
data='/home/ubuntu/.mxnet/datasets/bert_input/part-000.npz',
data_eval='/home/ubuntu/.mxnet/datasets/bert_input/part-000.npz',
dataset_name='book_corpus_wiki_en_uncased', dtype='float16',
dummy_data_len=None, gpus='0', kvstore='device', log_interval=250, lr=0.0001,
model='bert_12_768_12', num_buckets=10, num_steps=100000, pretrained=False,
profile=None, start_step=0, use_avg_len=False, verbose=False, warmup_ratio=0.01)
[20:23:04] ../src/base.cc:84: Upgrade advisory: this mxnet has been built
against cuDNN lib version 7501, which is older than the oldest version tested
by CI (7600). Set MXNET_CUDNN_LIB_CHECKING=0 to quiet this warning.
[20:23:10] ../src/storage/storage.cc:110: Using
GPUPooledRoundedStorageManager.
INFO:root:Using training data at
/home/ubuntu/.mxnet/datasets/bert_input/part-000.npz
INFO:root:1 files found.
[20:23:57]
../src/kvstore/././../ndarray/../operator/tensor/../../common/utils.h:474:
MXNET_SAFE_ACCUMULATION=1 is recommended for LayerNorm with float16 inputs. See
https://mxnet.apache.org/api/faq/env_var for more details.
[20:23:57] ../src/operator/nn/./../../common/utils.h:474:
MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See
https://mxnet.apache.org/api/faq/env_var for more details.
[20:23:57]
../src/kvstore/././../ndarray/../operator/tensor/../../common/utils.h:474:
MXNET_SAFE_ACCUMULATION=1 is recommended for LayerNorm with float16 inputs. See
https://mxnet.apache.org/api/faq/env_var for more details.
[20:23:58] ../src/operator/nn/./../../common/utils.h:474:
MXNET_SAFE_ACCUMULATION=1 is recommended for softmax with float16 inputs. See
https://mxnet.apache.org/api/faq/env_var for more details.
Traceback (most recent call last):
File
"/home/ubuntu/MXNet-Benchmarks/mxnet_scripts/training_scripts/bert/run_pretraining.py",
line 237, in <module>
train(data_train, model, nsp_loss, mlm_loss, len(vocab), ctx, store)
File
"/home/ubuntu/MXNet-Benchmarks/mxnet_scripts/training_scripts/bert/run_pretraining.py",
line 192, in train
fp16_trainer.step(1, max_norm=1)
File
"/home/ubuntu/MXNet-Benchmarks/mxnet_scripts/training_scripts/bert/fp16_utils.py",
line 171, in step
self.fp32_trainer.update(step_size)
File
"/home/ubuntu/workspace/incubator-mxnet/python/mxnet/gluon/trainer.py", line
437, in update
self._update(ignore_stale_grad)
File
"/home/ubuntu/workspace/incubator-mxnet/python/mxnet/gluon/trainer.py", line
470, in _update
updater(i, g, w)
File
"/home/ubuntu/workspace/incubator-mxnet/python/mxnet/optimizer/updater.py",
line 93, in __call__
self.optimizer.update_multi_precision([i], [w], [g], [self.states[i]])
File
"/home/ubuntu/workspace/incubator-mxnet/python/mxnet/optimizer/optimizer.py",
line 349, in update_multi_precision
self.update(indices, weights_master_copy, grads32, original_states)
File
"/home/ubuntu/.local/lib/python3.6/site-packages/gluonnlp/optimizer/lamb.py",
line 94, in update
assert(isinstance(weight, NDArray))
AssertionError
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]