eric-haibin-lin commented on a change in pull request #10391: [MXNET-139] 
Tutorial for mixed precision training with float16

 File path: docs/tutorials/python/
 @@ -0,0 +1,280 @@
+# Mixed precision training using float16
+The computational resources required for training deep neural networks has 
been increasing of late because of complexity of the architectures and size of 
models. Mixed precision training allows us to reduces the resources required by 
using lower precision arithmetic. In this approach we train using 16 bit 
floating points (half precision) while using 32 bit floating points (single 
precision) for output buffers of float16 computation. This combination of 
single and half precision gives rise to the name Mixed precision. It allows us 
to achieve the same accuracy as training with single precision, while 
decreasing the required memory and training or inference time.
+The float16 data type, is a 16 bit floating point representation according to 
the IEEE 754 standard. It has a dynamic range where the precision can go from 
0.0000000596046 (highest, for values closest to 0) to 32 (lowest, for values in 
the range 32768-65536). Despite the decreased precision when compared to single 
precision (float32), float16 computation can be much faster on supported 
hardware. The motivation for using float16 for deep learning comes from the 
idea that deep neural network architectures have natural resilience to errors 
due to backpropagation. Half precision is typically sufficient for training 
neural networks. This means that on hardware with specialized support for 
float16 computation we can greatly improve the speed of training and inference. 
This speedup results from faster matrix multiplication, saving on memory 
bandwidth and reduced communication costs. It also reduces the size of the 
model, allowing us to train larger models and use larger batch sizes. 
+The Volta range of Graphics Processing Units (GPUs) from Nvidia have Tensor 
Cores which perform efficient float16 computation. A tensor core allows 
accumulation of half precision products into single or half precision outputs. 
For the rest of this tutorial we assume that we are working with Nvidia's 
Tensor Cores on a Volta GPU.
+In this tutorial we will walk through how one can train deep learning neural 
networks with mixed precision on supported hardware. We will first see how to 
use float16 and then some techniques on achieving good performance and accuracy.
+## Prerequisites
+- Volta range of Nvidia GPUs
+- Cuda 9 or higher
+- CUDNN v7 or higher
 Review comment:
   Could you start with an overview that the tutorial covers both Gluon and 
Symbolic APIs?

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

Reply via email to