Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/162#discussion_r132339837 --- Diff: doc/design/modules/neural-network.tex --- @@ -46,41 +47,49 @@ \subsection{Formal Description} In the remaining part of this section, we will give a formal description of the derivation of objective function and its gradient. \paragraph{Objective function.} -We mostly follow the notations in example 1.5.3 from Bertsekas \cite{bertsekas1999nonlinear}, for a multilayer perceptron that has $N$ layers (stages), and the $k$th stage has $n_k$ activation units ($\phi : \mathbb{R} \to \mathbb{R}$), the objective function is given as -\[f_{(y, z)}(u) = \frac{1}{2} \|h(u, y) - z\|_2^2,\] -where $y \in \mathbb{R}^{n_0}$ is the input vector, $z \in \mathbb{R}^{n_N}$ is the output vector, +We mostly follow the notations in example 1.5.3 from Bertsekas \cite{bertsekas1999nonlinear}, for a multilayer perceptron that has $N$ layers (stages), and the $k$th stage has $n_k$ activation units ($\phi : \mathbb{R} \to \mathbb{R}$), the objective function for regression is given as +\[f_{(x, y)}(u) = \frac{1}{2} \|h(u, x) - y\|_2^2,\] +and for classification the objective function is given as +\[f_{(x, y)}(u) = \sum_i (\log(h_i(u, x)) * z_i + (1-\log(h_i(u, x))) *( 1- z_i) ,\] +where $x \in \mathbb{R}^{n_0}$ is the input vector, $y \in \mathbb{R}^{n_N}$ is the output vector (one hot encoded for classification), \footnote{Of course, the objective function can be defined over a set of input-output vector pairs, which is simply given as the addition of the above $f$.} and the coefficients are given as -\[u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 0,...,n_{k-1}, \: j = 1,...,n_k\}\] +\[u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 0,...,n_{k-1}, \: j = 1,...,n_k\},\] +And are initialized from a uniform distribution as follows: +\[u_{k}^{sj} = uniform(-r,r),\] +where r is defined as follows: +\[r = \sqrt{\frac{6}{n_k+n_{k+1}}}\] +With regularization, an additional term enters the objective function, given as +\[\sum_{u_k^{sj}} \frac{1}{2} \lambda u_k^{sj2} \] This still leaves $h : \mathbb{R}^{n_0} \to \mathbb{R}^{n_N}$ as an open item. -Let $x_k \in \mathbb{R}^{n_k}, k = 1,...,N$ be the output vector of the $k$th layer. Then we define $h(u, y) = x_N$, based on setting $x_0 = y$ and the $j$th component of $x_k$ is given in an iterative fashion as -\footnote{$x_k^0 \equiv 1$ is used to simplified the notations, and $x_k^0$ is not a component of $x_k$, for any $k = 0,...,N$.} +Let $o_k \in \mathbb{R}^{n_k}, k = 1,...,N$ be the output vector of the $k$th layer. Then we define $h(u, x) = o_N$, based on setting $o_0 = x$ and the $j$th component of $o_k$ is given in an iterative fashion as +\footnote{$o_k^0 \equiv 1$ is used to simplified the notations, and $o_k^0$ is not a component of $o_k$, for any $k = 0,...,N$.} --- End diff -- `$j$th` -> `$j$^{th}`. Similar changes in several other places.

## Advertising

--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---