szha commented on a change in pull request #7395: Drafted documentation for 
autograd. 
URL: https://github.com/apache/incubator-mxnet/pull/7395#discussion_r132244948
 
 

 ##########
 File path: docs/api/python/autograd.md
 ##########
 @@ -9,6 +9,78 @@
 .. warning:: This package is currently experimental and may change in the near 
future.
 ```
 
+## Overview
+
+The ``autograd`` package consists of functions that enable automatic
+differentiation of scalar values with respect to NDArrays.
+For example, in machine learning applications,
+``autograd`` is often used to calculate the gradients
+of loss functions with respect to parameters.
+
+While automatic differentiation was previously available
+through the symbolic API, ``autograd`` works in the fully imperative context.
+In other words, we can just work with NDArrays
+and do not need to specify a computational graph a priori
+in order to take gradients.
+Of course, in order to differentiate a value ``y`` with respect
+to an NDArray ``x``, we need to know how
+the value of ``y`` depends on ``x``.
+You might wonder, how does ``autograd`` do this
+without a pre-specified computation graph?
+
+
+The trick here is that ``autograd`` builds a computation graph on the fly.
+When we calculate ``y = f(x)``, MXNet can remember
+how the value of ``y`` relates to ``x``.
+It's as if MXNet turned on a tape recorder to keep track of
+how each value was generated.
+To indicate to MXNet that we want to turn on the metaphorical tape recorder,
+all we have to do is place the code in a ``with autograd.record():`` block.
+For any variable ``x`` where we might later want to access its gradient,
+we can call ``x.attach_grad`` to allocate its space.
+Then once we calculate the value of a function ``y``
+inside a ``with autograd.record()`` block, we can call ``y.backward()``.
+
+```python
+>>> x = mx.nd.array([1,2,3,4])
+>>> x.attach_grad()
+>>> with mx.autograd.record():
+    y = x * x
+>>> print(x.grad)
+>>>[ 2.  4.  6.  8.]
+<NDArray 4 @cpu(0)>
+```
+
+### ``Train_mode`` and ``predict_mode``
+
+Often, we want to define functions that behave differently
+when we are training models vs making predictions.
+By default, MXNet assumes we are in predict mode.
+However, usually when we take gradients, we are in the process of training.
+MXNet let's us decouple *training* vs *predict_mode* from
+*recording* vs *not recording*.
 
 Review comment:
   Users might not be aware of the coupling, since `record(train_mode=True)` is 
the default form of record.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to