# [GitHub] incubator-madlib pull request #162: MLP: Multilayer Perceptron Phase 2

Github user njayaram2 commented on a diff in the pull request:

--- Diff: doc/design/modules/neural-network.tex ---
@@ -46,41 +47,49 @@ \subsection{Formal Description}
In the remaining part of this section, we will give a formal description
of the derivation of objective function and its gradient.

\paragraph{Objective function.}
-We mostly follow the notations in example 1.5.3 from Bertsekas
\cite{bertsekas1999nonlinear}, for a multilayer perceptron that has $N$ layers
(stages), and the $k$th stage has $n_k$ activation units ($\phi : \mathbb{R} \to \mathbb{R}$), the objective function is given as
-$f_{(y, z)}(u) = \frac{1}{2} \|h(u, y) - z\|_2^2,$
-where $y \in \mathbb{R}^{n_0}$ is the input vector, $z \in \mathbb{R}^{n_N}$ is the output vector,
+We mostly follow the notations in example 1.5.3 from Bertsekas
\cite{bertsekas1999nonlinear}, for a multilayer perceptron that has $N$ layers
(stages), and the $k$th stage has $n_k$ activation units ($\phi : \mathbb{R} \to \mathbb{R}$), the objective function for regression is given as
+$f_{(x, y)}(u) = \frac{1}{2} \|h(u, x) - y\|_2^2,$
+and for classification the objective function is given as
+$f_{(x, y)}(u) = \sum_i (\log(h_i(u, x)) * z_i + (1-\log(h_i(u, x))) *( 1- z_i) ,$
+where $x \in \mathbb{R}^{n_0}$ is the input vector, $y \in \mathbb{R}^{n_N}$ is the output vector (one hot encoded for classification),
\footnote{Of course, the objective function can be defined over a set of
input-output vector pairs, which is simply given as the addition of the above
$f$.}
and the coefficients are given as
-$u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 0,...,n_{k-1}, \: j = 1,...,n_k\}$
+$u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 0,...,n_{k-1}, \: j = 1,...,n_k\},$
+And are initialized from a uniform distribution as follows:
+$u_{k}^{sj} = uniform(-r,r),$
+where r is defined as follows:
+$r = \sqrt{\frac{6}{n_k+n_{k+1}}}$
+With regularization, an additional term enters the objective function,
given as
+$\sum_{u_k^{sj}} \frac{1}{2} \lambda u_k^{sj2}$
This still leaves $h : \mathbb{R}^{n_0} \to \mathbb{R}^{n_N}$ as an open
item.
-Let $x_k \in \mathbb{R}^{n_k}, k = 1,...,N$ be the output vector of the
$k$th layer. Then we define $h(u, y) = x_N$, based on setting $x_0 = y$ and the
$j$th component of $x_k$ is given in an iterative fashion as
-\footnote{$x_k^0 \equiv 1$ is used to simplified the notations, and
$x_k^0$ is not a component of $x_k$, for any $k = 0,...,N$.}
+Let $o_k \in \mathbb{R}^{n_k}, k = 1,...,N$ be the output vector of the
$k$th layer. Then we define $h(u, x) = o_N$, based on setting $o_0 = x$ and the
$j$th component of $o_k$ is given in an iterative fashion as
+\footnote{$o_k^0 \equiv 1$ is used to simplified the notations, and
$o_k^0$ is not a component of $o_k$, for any $k = 0,...,N$.}
\begin{alignedat}{5} - x_k^j = \phi \left( \sum_{s=0}^{n_{k-1}} x_{k-1}^s u_{k-1}^{sj} \right), &\quad k = 1,...,N, \; j = 1,...,n_k + o_k^j = \phi \left( \sum_{s=0}^{n_{k-1}} o_{k-1}^s u_{k-1}^{sj} \right), &\quad k = 1,...,N, \; j = 1,...,n_k \end{alignedat}

\paragraph{Gradient of the End Layer.}
Let's first handle $u_{N-1}^{st}, s = 0,...,n_{N-1}, t = 1,...,n_N$.
-Let $z^t$ denote the $t$th component of $z \in \mathbb{R}^{n_N}$, and
$h^t$ the $t$th component of output of $h$.
+Let $y^t$ denote the $t$th component of $y \in \mathbb{R}^{n_N}$, and
$h^t$ the $t$th component of output of $h$.
\begin{aligned} \frac{\partial f}{\partial u_{N-1}^{st}} - &= \left( h^t(u, y) - z^t \right) \cdot \frac{\partial h^t(u, y)}{\partial u_{N-1}^{st}} \\ - &= \left( x_N^t - z^t \right) \cdot \frac{\partial x_N^t}{\partial u_{N-1}^{st}} \\ - &= \left( x_N^t - z^t \right) \cdot \frac{\partial \phi \left( \sum_{s=0}^{n_{N-1}} x_{N-1}^s u_{N-1}^{st} \right)}{\partial u_{N-1}^{st}} \\ - &= \left( x_N^t - z^t \right) \cdot \phi' \left( \sum_{s=0}^{n_{N-1}} x_{N-1}^s u_{N-1}^{st} \right) \cdot x_{N-1}^s \\ + &= \left( h^t(u, x) - y^t \right) \cdot \frac{\partial h^t(u, x)}{\partial u_{N-1}^{st}} \\ + &= \left( o_N^t - y^t \right) \cdot \frac{\partial o_N^t}{\partial u_{N-1}^{st}} \\ + &= \left( o_N^t - y^t \right) \cdot \frac{\partial \phi \left( \sum_{s=0}^{n_{N-1}} o_{N-1}^s u_{N-1}^{st} \right)}{\partial u_{N-1}^{st}} \\ + &= \left( o_N^t - y^t \right) \cdot \phi' \left( \sum_{s=0}^{n_{N-1}} o_{N-1}^s u_{N-1}^{st} \right) \cdot o_{N-1}^s \\ \end{aligned}
To ease the notation, let the input vector of the $j$th activation unit of
the $(k+1)$th layer be
--- End diff --

$(k+1)$th -> $(k+1)$^{th}.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---