szhengac opened a new issue #18024: Transformer Model Segfault URL: https://github.com/apache/incubator-mxnet/issues/18024 Training transformer in [gluonnlp](https://github.com/dmlc/gluon-nlp/tree/master/scripts/machine_translation) with master brach leads to segmentation fault. The training passed with the nightly build on March 12th. I used AWS Linux AMI with cuda100. Command: ` python train_transformer.py --dataset WMT2014BPE --src_lang en --tgt_lang de --batch_size 2700 --optimizer adam --num_accumulated 16 --lr 3.0 --warmup_steps 4000 --save_dir transformer_en_de_u512 --epochs 30 --gpus 0,1,2,3,4,5,6,7 --scaled --average_start 5 --num_buckets 20 --bucket_scheme exp --bleu 13a --log_interval 10 `
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
