access2rohit edited a comment on issue #17331:
URL: 
https://github.com/apache/incubator-mxnet/issues/17331#issuecomment-622665471


   @szha @eric-haibin-lin @apeforest 
   
   With current master and new broadcast_axis changes on p3.16xl single GPU 
training run. 
   
   Bert Run Command:
   ```
   python3 run_pretraining.py --data='./part-0000.train' 
--data_eval='./part-0000.train' --num_steps 100 --lr 1e-4 --optimizer lamb 
--accumulate 1 --raw --gpus 0 --num_dataset_workers 2 --num_batch_workers 1 
--circle_length 1 --total_batch_size 4 --total_batch_size_eval 4 --log_interval 
10
   ```
   
   Results:
   
   
   | Code Version | throughput (samples/sec) |  |      | total time             
                   |
   
|--------------|------------|---------------|--------|-------------------------------------------|
   |              | avg        | p50           | p90    | (only training 
ignoring evaluation steps) |
   | master LT    | 24.38k     | 25.50k        | 28.47k | 134.8 sec             
                    |
   | master       | 25.90k     | 25.90k        | 27.82k | 131.9 sec             
                    |
   | new LT       | 25.87k     | 25.80k        | 28.00k | 127.3 sec             
                    |
   | new          | 25.92k     | 25.80k        | 27.80k | 131.5 sec             
                    |
   
   
   "new" refers to mxnet code with optimized broadcast_axis.
   "master" refers to mxnet master branch code
   "LT" refers to of the build was done after enabling large tensor.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to