[GitHub] madlib pull request #272: MLP: Add momentum and nesterov to gradient updates...

njayaram2 Fri, 08 Jun 2018 11:58:00 -0700

Github user njayaram2 commented on a diff in the pull request:

    https://github.com/apache/madlib/pull/272#discussion_r194151705
  
    --- Diff: doc/design/modules/neural-network.tex ---
    @@ -117,6 +122,26 @@ \subsubsection{Backpropagation}
     \[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot 
u_{k}^{jt} \right) \cdot \phi'(\mathit{net}_{k}^j)}\]
     where $k = 1,...,N-1$, and $j = 1,...,n_{k}$.
     
    +\paragraph{Momentum updates.}
    +Momentum\cite{momentum_ilya}\cite{momentum_cs231n} can help accelerate 
learning and avoid local minima when using gradient descent. We also support 
nesterov's accelarated gradient due to its look ahead characteristics. \\
    +Here we need to introduce two new variables namely velocity and momentum. 
momentum must be in the range 0 to 1, where 0 means no momentum.
    +The velocity is the same size as the coefficient and is accumulated in the 
direction of persistent reduction, which speeds up the optimization. The 
momentum value is responsible for damping the velocity and is analogous to the 
coefficient of friction. \\
    +In classical momentum we first correct the velocity, and then update the 
model with that velocity, whereas in Nesterov momentum, we first move the model 
in the direction of momentum*velocity , then correct the velocity and finally 
use the updated model to calculate the gradient. The main difference being that 
in classical momentum, we compute the gradient before updating the model 
whereas in nesterov we first update the model and then compute the gradient 
from the updated position.\\
    --- End diff --
    
    `momentum*velocity ,` -> `momentum*velocity,`. The extra space before the 
comma is moving the `,` to the next line in the pdf.

---

[GitHub] madlib pull request #272: MLP: Add momentum and nesterov to gradient updates...

Reply via email to