ZhennanQin opened a new pull request #14931: Re-enable static cached_op 
optimization
URL: https://github.com/apache/incubator-mxnet/pull/14931
 
 
   ## Description ##
   This PR firstly filed at 
https://github.com/apache/incubator-mxnet/pull/14785, but reverted because of  
https://github.com/dmlc/gluon-nlp/issues/690. Now it's fixed.
   
   The fixed output is,
   ```
   $ python run_pretraining.py --gpus 0 --batch_size 32 --lr 2e-5 --data 
'out/*.npz' --warmup_ratio 0.5 --num_steps 20 --pretrained --log_interval=2 
--data_eval 'out/*.npz' --batch_size_eval 8 --ckpt_dir ckpt
   INFO:root:Namespace(accumulate=1, batch_size=32, batch_size_eval=8, 
ckpt_dir='ckpt', ckpt_interval=250000, data='out/*.npz', data_eval='out/*.npz', 
dataset_name='book_corpus_wiki_en_uncased', dtype='float32', 
dummy_data_len=None, gpus='0', kvstore='device', log_interval=2, lr=2e-05, 
model='bert_12_768_12', num_buckets=1, num_steps=20, pretrained=True, 
profile=None, seed=0, start_step=0, use_avg_len=False, verbose=False, 
warmup_ratio=0.5)
   [12:41:47] src/storage/storage.cc:108: Using GPUPooledRoundedStorageManager.
   INFO:root:Using training data at out/*.npz
   INFO:root:[step 1]      mlm_loss=1.57931        mlm_acc=46.99793        
nsp_loss=0.32225        nsp_acc=84.375  throughput=2.4K tks/s   lr=0.0000020 
time=1.34, latency=668.3 ms/batch
   INFO:root:[step 3]      mlm_loss=3.27771        mlm_acc=48.94737        
nsp_loss=0.47513        nsp_acc=88.333  throughput=6.8K tks/s   lr=0.0000060 
time=0.93, latency=466.5 ms/batch
   INFO:root:[step 5]      mlm_loss=2.79173        mlm_acc=50.60241        
nsp_loss=0.12542        nsp_acc=92.857  throughput=6.2K tks/s   lr=0.0000100 
time=0.91, latency=453.5 ms/batch
   INFO:root:[step 7]      mlm_loss=2.81245        mlm_acc=52.21421        
nsp_loss=0.25357        nsp_acc=92.188  throughput=6.5K tks/s   lr=0.0000140 
time=1.00, latency=499.9 ms/batch
   INFO:root:[step 9]      mlm_loss=2.09288        mlm_acc=62.17143        
nsp_loss=0.01765        nsp_acc=100.000 throughput=6.3K tks/s   lr=0.0000180 
time=0.93, latency=464.5 ms/batch
   INFO:root:[step 11]     mlm_loss=1.53132        mlm_acc=71.63121        
nsp_loss=0.00595        nsp_acc=100.000 throughput=6.8K tks/s   lr=0.0000090 
time=0.98, latency=490.6 ms/batch
   INFO:root:[step 13]     mlm_loss=1.04932        mlm_acc=79.85866        
nsp_loss=0.00190        nsp_acc=100.000 throughput=5.9K tks/s   lr=0.0000070 
time=0.97, latency=484.6 ms/batch
   INFO:root:[step 15]     mlm_loss=0.93987        mlm_acc=81.75027        
nsp_loss=0.00284        nsp_acc=100.000 throughput=6.3K tks/s   lr=0.0000050 
time=1.00, latency=500.4 ms/batch
   INFO:root:[step 17]     mlm_loss=0.68529        mlm_acc=86.50794        
nsp_loss=0.00250        nsp_acc=100.000 throughput=6.6K tks/s   lr=0.0000030 
time=1.03, latency=516.9 ms/batch
   INFO:root:[step 19]     mlm_loss=0.50951        mlm_acc=89.44900        
nsp_loss=0.00218        nsp_acc=100.000 throughput=6.2K tks/s   lr=0.0000010 
time=0.92, latency=460.7 ms/batch
   INFO:root:[step 20] Saving checkpoints to ckpt/0000020.params, 
ckpt/0000020.states.
   INFO:root:Train cost=26.3s
   INFO:root:Using evaluation data at out/*.npz
   INFO:root:[step 1]      mlm_loss=0.20752        mlm_acc=93.26923        
nsp_loss=0.00008        nsp_acc=100.000 throughput=3.0K tks/s   lr=0.0000000 
time=0.23, latency=115.1 ms/batch
   INFO:root:[step 3]      mlm_loss=0.43117        mlm_acc=91.76955        
nsp_loss=0.00139        nsp_acc=100.000 throughput=10.5K tks/s  lr=0.0000000 
time=0.15, latency=77.5 ms/batch
   INFO:root:[step 5]      mlm_loss=0.39373        mlm_acc=92.27941        
nsp_loss=0.00016        nsp_acc=100.000 throughput=12.0K tks/s  lr=0.0000000 
time=0.15, latency=76.4 ms/batch
   INFO:root:[step 7]      mlm_loss=0.37862        mlm_acc=91.18943        
nsp_loss=0.00093        nsp_acc=100.000 throughput=10.4K tks/s  lr=0.0000000 
time=0.15, latency=73.4 ms/batch
   INFO:root:mlm_loss=0.406        mlm_acc=92.0    nsp_loss=0.001  nsp_acc=100.0
   INFO:root:Eval cost=0.7s
   ```
   
   @pengzhao-intel @eric-haibin-lin 
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to 
the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) 
created (except PRs with tiny changes)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding 
a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing 
distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a 
new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments 
are documented. 
   - For new examples, README.md is added to explain the what the example does, 
the source of the dataset, expected performance on test set and reference to 
the original paper if applicable
   - Check the API doc at 
http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   - [ ] To the my best knowledge, examples are either not affected by this 
change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be 
made.
   - Interesting edge cases to note here
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to