[GitHub] madlib pull request #272: MLP: Add momentum and nesterov to gradient updates...

njayaram2 Thu, 31 May 2018 15:12:48 -0700

Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/272#discussion_r192245605
  
    --- Diff: doc/design/modules/neural-network.tex ---
    @@ -117,6 +117,24 @@ \subsubsection{Backpropagation}
     \[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot 
u_{k}^{jt} \right) \cdot \phi'(\mathit{net}_{k}^j)}\]
     where $k = 1,...,N-1$, and $j = 1,...,n_{k}$.
     
    +\paragraph{Momentum updates.}
    +Momentum\cite{momentum_ilya}\cite{momentum_cs231n} can help accelerate 
learning and avoid local minima when using gradient descent. We also support 
nesterov's accelarated gradient due to its look ahead characteristics. \\
    +Here we need to introduce two new variables namely velocity and momentum. 
momentum must be in the range 0 to 1, where 0 means no momentum. The momentum 
value is responsible for damping the velocity and is analogous to the 
coefficient of friction. \\
    +In classical momentum you first correct the velocity and step with that 
velocity, whereas in Nesterov momentum you first step in the velocity direction 
then make a correction to the velocity vector based on the new location. \\
    --- End diff --
    
    `step with that velocity` is a little confusing to me. Do we have some 
source where it is defined this way?
    If it's any better, can we use the following text to say what the 
difference between momentum and NAG is (source is 
http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf):
    ```
    ... the key difference between momentum and Nesterovâs
    accelerated gradient is that momentum computes the gradient before applying 
the velocity, while Nesterovâs
    accelerated gradient computes the gradient after doing so.
    ```
    If `step with that velocity` is a standard way of defining it, then I am 
okay with it.
    
    This comment applies to user and online docs too.

---

[GitHub] madlib pull request #272: MLP: Add momentum and nesterov to gradient updates...

Reply via email to