Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/272#discussion_r192246463 --- Diff: doc/design/modules/neural-network.tex --- @@ -117,6 +117,24 @@ \subsubsection{Backpropagation} \[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot u_{k}^{jt} \right) \cdot \phi'(\mathit{net}_{k}^j)}\] where $k = 1,...,N-1$, and $j = 1,...,n_{k}$. +\paragraph{Momentum updates.} +Momentum\cite{momentum_ilya}\cite{momentum_cs231n} can help accelerate learning and avoid local minima when using gradient descent. We also support nesterov's accelarated gradient due to its look ahead characteristics. \\ +Here we need to introduce two new variables namely velocity and momentum. momentum must be in the range 0 to 1, where 0 means no momentum. The momentum value is responsible for damping the velocity and is analogous to the coefficient of friction. \\ --- End diff -- We only describe the range for momentum, but nothing about velocity before the next line begins. Can we say a bit about what velocity is before we describe how it is used?
---