Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/272#discussion_r192245605
  
    --- Diff: doc/design/modules/neural-network.tex ---
    @@ -117,6 +117,24 @@ \subsubsection{Backpropagation}
     \[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot 
u_{k}^{jt} \right) \cdot \phi'(\mathit{net}_{k}^j)}\]
     where $k = 1,...,N-1$, and $j = 1,...,n_{k}$.
     
    +\paragraph{Momentum updates.}
    +Momentum\cite{momentum_ilya}\cite{momentum_cs231n} can help accelerate 
learning and avoid local minima when using gradient descent. We also support 
nesterov's accelarated gradient due to its look ahead characteristics. \\
    +Here we need to introduce two new variables namely velocity and momentum. 
momentum must be in the range 0 to 1, where 0 means no momentum. The momentum 
value is responsible for damping the velocity and is analogous to the 
coefficient of friction. \\
    +In classical momentum you first correct the velocity and step with that 
velocity, whereas in Nesterov momentum you first step in the velocity direction 
then make a correction to the velocity vector based on the new location. \\
    --- End diff --
    
    `step with that velocity` is a little confusing to me. Do we have some 
source where it is defined this way?
    If it's any better, can we use the following text to say what the 
difference between momentum and NAG is (source is 
http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf):
    ```
    ... the key difference between momentum and Nesterov’s
    accelerated gradient is that momentum computes the gradient before applying 
the velocity, while Nesterov’s
    accelerated gradient computes the gradient after doing so.
    ```
    If `step with that velocity` is a standard way of defining it, then I am 
okay with it.
    
    This comment applies to user and online docs too.


---

Reply via email to