Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/272#discussion_r194151705 --- Diff: doc/design/modules/neural-network.tex --- @@ -117,6 +122,26 @@ \subsubsection{Backpropagation} \[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot u_{k}^{jt} \right) \cdot \phi'(\mathit{net}_{k}^j)}\] where $k = 1,...,N-1$, and $j = 1,...,n_{k}$. +\paragraph{Momentum updates.} +Momentum\cite{momentum_ilya}\cite{momentum_cs231n} can help accelerate learning and avoid local minima when using gradient descent. We also support nesterov's accelarated gradient due to its look ahead characteristics. \\ +Here we need to introduce two new variables namely velocity and momentum. momentum must be in the range 0 to 1, where 0 means no momentum. +The velocity is the same size as the coefficient and is accumulated in the direction of persistent reduction, which speeds up the optimization. The momentum value is responsible for damping the velocity and is analogous to the coefficient of friction. \\ +In classical momentum we first correct the velocity, and then update the model with that velocity, whereas in Nesterov momentum, we first move the model in the direction of momentum*velocity , then correct the velocity and finally use the updated model to calculate the gradient. The main difference being that in classical momentum, we compute the gradient before updating the model whereas in nesterov we first update the model and then compute the gradient from the updated position.\\ --- End diff -- `momentum*velocity ,` -> `momentum*velocity,`. The extra space before the comma is moving the `,` to the next line in the pdf.
---