eric-haibin-lin commented on a change in pull request #11533: Fix image 
classification scripts and Improve Fp16 tutorial
URL: https://github.com/apache/incubator-mxnet/pull/11533#discussion_r201191232
 
 

 ##########
 File path: docs/faq/float16.md
 ##########
 @@ -102,9 +102,17 @@ python fine-tune.py --network resnet --num-layers 50 
--pretrained-model imagenet
 ```
 
 ## Example training results
-Here is a plot to compare the training curves of a Resnet50 v1 network on the 
Imagenet 2012 dataset. These training jobs ran for 95 epochs with a batch size 
of 1024 using a learning rate of 0.4 decayed by a factor of 1 at epochs 
30,60,90 and used Gluon. The only changes made for the float16 job when 
compared to the float32 job were that the network and data were cast to 
float16, and the multi-precision mode was used for optimizer. The final 
accuracies at 95th epoch were **76.598% for float16** and **76.486% for 
float32**. The difference is within what's normal random variation, and there 
is no reason to expect float16 to have better accuracy than float32 in general. 
This run was approximately **65% faster** to train with float16.
+Let us consider training a Resnet50 v1 model on the Imagenet 2012 dataset. For 
this model, the GPU memory usage is close to the capacity of V100 GPU with a 
batch size of 128 when using float32. Using float16 allows the use of 256 batch 
size. Shared below are results using 8 V100 GPUs. Let us compare the three 
scenarios that arise here: float32 with 1024 batch size, float16 with 1024 
batch size and float16 with 2048 batch size. These jobs trained for 90 epochs 
using a learning rate of 0.4 for 1024 batch size and 0.8 for 2048 batch size. 
This learning rate was decayed by a factor of 0.1 at the 30th, 60th and 80th 
epochs. The only changes made for the float16 jobs when compared to the float32 
job were that the network and data were cast to float16, and the 
multi-precision mode was used for optimizer. The final accuracy at 90th epoch 
and the time to train are tabulated below for these three scenarios. The top-1 
validation errors at the end of each epoch are also plotted below.
 
 Review comment:
   It's better to be specific on the overall hardware setup (it's not done on 
DGX).
   
   `Shared below are results using 8 V100 GPUs` ->
   `Shared below are results using 8 V100 GPUs on AWS p3.16xlarge instance. `
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to