[3/3] incubator-madlib git commit: New module: Multilayer Perceptron

riyer Thu, 20 Jul 2017 15:33:06 -0700

New module: Multilayer Perceptron

JIRA: MADLIB-413


The core design and architecture for this work was built by
Xixuan Feng (Aaron) <xixuan.f...@gmail.com>.

Additional author: Rahul Iyer <ri...@apache.org>

This commit adds capability to build fully connected neural networks.
Current work includes train and predict functions and is limited to the
sigmoid activation function. Loss functions include 'mean squared error'
for regression and 'cross entropy' for classification.

Closes #149


Project: http://git-wip-us.apache.org/repos/asf/incubator-madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-madlib/commit/4fcb60ed
Tree: http://git-wip-us.apache.org/repos/asf/incubator-madlib/tree/4fcb60ed
Diff: http://git-wip-us.apache.org/repos/asf/incubator-madlib/diff/4fcb60ed

Branch: refs/heads/master
Commit: 4fcb60ed8bea793a98eca802886492d00ef2f676
Parents: e9e365a
Author: Cooper Sloan <cooper.sl...@gmail.com>
Authored: Fri Jun 16 17:41:07 2017 -0700
Committer: Rahul Iyer <ri...@apache.org>
Committed: Thu Jul 20 15:31:38 2017 -0700

----------------------------------------------------------------------
 doc/design/design.tex                           |   1 +
 doc/design/modules/neural-network.tex           | 199 +++++
 doc/literature.bib                              |  18 +
 doc/mainpage.dox.in                             |   4 +
 src/modules/convex/algo/loss.hpp                |   6 +-
 src/modules/convex/convex.hpp                   |   3 +-
 src/modules/convex/mlp_igd.cpp                  | 237 ++++++
 src/modules/convex/mlp_igd.hpp                  |  56 ++
 src/modules/convex/task/mlp.hpp                 | 332 ++++++++
 src/modules/convex/type/model.hpp               | 140 +++
 src/modules/convex/type/state.hpp               | 173 +++-
 src/modules/convex/type/tuple.hpp               |   7 +
 src/ports/postgres/modules/convex/mlp.sql_in    | 752 +++++++++++++++++
 src/ports/postgres/modules/convex/mlp_igd.py_in | 752 +++++++++++++++++
 .../postgres/modules/convex/test/mlp.sql_in     | 845 +++++++++++++++++++
 .../modules/utilities/validate_args.py_in       |  43 +-
 16 files changed, 3546 insertions(+), 22 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/doc/design/design.tex
----------------------------------------------------------------------
diff --git a/doc/design/design.tex b/doc/design/design.tex
index 30e0ef7..e9ed7b8 100644
--- a/doc/design/design.tex
+++ b/doc/design/design.tex
@@ -230,6 +230,7 @@
 \input{modules/random-forests}
 \input{modules/SVM}
 \input{modules/graph}
+\input{modules/neural-network}
 \printbibliography
 
 \end{document}

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/doc/design/modules/neural-network.tex
----------------------------------------------------------------------
diff --git a/doc/design/modules/neural-network.tex 
b/doc/design/modules/neural-network.tex
new file mode 100644
index 0000000..8802361
--- /dev/null
+++ b/doc/design/modules/neural-network.tex
@@ -0,0 +1,199 @@
+% Licensed to the Apache Software Foundation (ASF) under one
+% or more contributor license agreements.  See the NOTICE file
+% distributed with this work for additional information
+% regarding copyright ownership.  The ASF licenses this file
+% to you under the Apache License, Version 2.0 (the
+% "License"); you may not use this file except in compliance
+% with the License.  You may obtain a copy of the License at
+
+%   http://www.apache.org/licenses/LICENSE-2.0
+
+% Unless required by applicable law or agreed to in writing,
+% software distributed under the License is distributed on an
+% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+% KIND, either express or implied.  See the License for the
+% specific language governing permissions and limitations
+% under the License.
+
+% When using TeXShop on the Mac, let it know the root document.
+% The following must be one of the first 20 lines.
+% !TEX root = ../design.tex
+
+\chapter{Neural Network}
+
+\begin{moduleinfo}
+\item[Authors] {Xixuan Feng}
+\end{moduleinfo}
+
+% Abstract. What is the problem we want to solve?
+This module implements artificial neural network \cite{ann_wiki}.
+
+\section{Multilayer Perceptron}
+Multilayer perceptron is arguably the most popular model among many neural 
network models \cite{mlp_wiki}.
+Here, we learn the coefficients by minimizing a least square objective 
function (\cite{bertsekas1999nonlinear}, example 1.5.3).
+
+% Background. Why can we solve the problem with gradient-based methods?
+\subsection{Solving as a Convex Program}
+Although the objective function is not necessarily convex, gradient descent or 
incremental gradient descent are still commonly-used algorithms to learn the 
coefficients.
+To clarify, gradient-based methods are not different from the famous 
backpropagation, which is essentially a way to compute the gradient value.
+
+\subsection{Formal Description}
+Having the convex programming framework, the main tasks of implementing a 
learner include:
+(a) choose a subset of algorithms;
+(b) implement the related computation logic for the objective function, e.g., 
gradient.
+
+For multilayer perceptron, we choose incremental gradient descent (IGD).
+In the remaining part of this section, we will give a formal description of 
the derivation of objective function and its gradient.
+
+\paragraph{Objective function.}
+We mostly follow the notations in example 1.5.3 from Bertsekas 
\cite{bertsekas1999nonlinear}, for a multilayer perceptron that has $N$ layers 
(stages), and the $k$th stage has $n_k$ activation units ($\phi : \mathbb{R} 
\to \mathbb{R}$), the objective function is given as
+\[f_{(y, z)}(u) = \frac{1}{2} \|h(u, y) - z\|_2^2,\]
+where $y \in \mathbb{R}^{n_0}$ is the input vector, $z \in \mathbb{R}^{n_N}$ 
is the output vector,
+\footnote{Of course, the objective function can be defined over a set of 
input-output vector pairs, which is simply given as the addition of the above 
$f$.}
+and the coefficients are given as
+\[u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 0,...,n_{k-1}, \: j = 
1,...,n_k\}\]
+This still leaves $h : \mathbb{R}^{n_0} \to \mathbb{R}^{n_N}$ as an open item.
+Let $x_k \in \mathbb{R}^{n_k}, k = 1,...,N$ be the output vector of the $k$th 
layer. Then we define $h(u, y) = x_N$, based on setting $x_0 = y$ and the $j$th 
component of $x_k$ is given in an iterative fashion as
+\footnote{$x_k^0 \equiv 1$ is used to simplified the notations, and $x_k^0$ is 
not a component of $x_k$, for any $k = 0,...,N$.}
+\[\begin{alignedat}{5}
+    x_k^j = \phi \left( \sum_{s=0}^{n_{k-1}} x_{k-1}^s u_{k-1}^{sj} \right), 
&\quad k = 1,...,N, \; j = 1,...,n_k
+\end{alignedat}\]
+
+\paragraph{Gradient of the End Layer.}
+Let's first handle $u_{N-1}^{st}, s = 0,...,n_{N-1}, t = 1,...,n_N$.
+Let $z^t$ denote the $t$th component of $z \in \mathbb{R}^{n_N}$, and $h^t$ 
the $t$th component of output of $h$.
+\[\begin{aligned}
+    \frac{\partial f}{\partial u_{N-1}^{st}}
+    &= \left( h^t(u, y) - z^t \right) \cdot \frac{\partial h^t(u, y)}{\partial 
u_{N-1}^{st}} \\
+    &= \left( x_N^t - z^t \right) \cdot \frac{\partial x_N^t}{\partial 
u_{N-1}^{st}} \\
+    &= \left( x_N^t - z^t \right) \cdot \frac{\partial \phi \left( 
\sum_{s=0}^{n_{N-1}} x_{N-1}^s u_{N-1}^{st} \right)}{\partial u_{N-1}^{st}} \\
+    &= \left( x_N^t - z^t \right) \cdot \phi' \left( \sum_{s=0}^{n_{N-1}} 
x_{N-1}^s u_{N-1}^{st} \right) \cdot x_{N-1}^s \\
+\end{aligned}\]
+To ease the notation, let the input vector of the $j$th activation unit of the 
$(k+1)$th layer be
+\[\mathit{net}_k^j =\sum_{s=0}^{n_{k-1}} x_{k-1}^s u_{k-1}^{sj},\]
+where $k = 1,...,N, \; j = 1,...,n_k$, and note that $x_k^j 
=\phi(\mathit{net}_k^j)$. Finally, the gradient
+\[\frac{\partial f}{\partial u_{N-1}^{st}} = \left( x_N^t - z^t \right) \cdot 
\phi' ( \mathit{net}_N^t ) \cdot x_{N-1}^s\]
+For any $s = 0,...,n_{N-1}, t =1,...,n_N$, we are given $z^t$, and $x_N^t, 
\mathit{net}_N^t, x_{N-1}^s$ can be computed by forward iterating the network 
layer by layer (also called the feed-forward pass). Therefore, we now know how 
to compute the coefficients for the end layer $u_{N-1}^{st}, s = 0,...,n_{N-1}, 
t =1,...,n_N$.
+
+\subsubsection{Backpropagation}
+For inner (hidden) layers, it is more difficult to compute the partial 
derivative over the input of activation units (i.e., $\mathit{net}_k, k = 
1,...,N-1$).
+That said, $\frac{\partial f}{\partial \mathit{net}_N^t} = (x_N^t - z^t) 
\phi'(\mathit{net}_N^t)$ is easy, where $t = 1,...,n_N$, but $\frac{\partial 
f}{\partial \mathit{net}_k^j}$ is hard, where $k = 1,...,N-1, j = 1,..,n_k$.
+This hard-to-compute statistic is referred to as \textit{delta error}, and let 
$\delta_k^j = \frac{\partial f}{\partial \mathit{net}_k^j}$, where $k = 
1,...,N-1, j = 1,..,n_k$.
+If this is solved, the gradient can be easily computed as follow
+\[\frac{\partial f}{\partial u_{k-1}^{sj}} = \boxed{\frac{\partial f}{\partial 
\mathit{net}_k^j}} \cdot \frac{\partial \mathit{net}_k^j}{\partial 
u_{k-1}^{sj}} = \boxed{\delta_k^j} x_{k-1}^s,\]
+where $k = 1,...,N-1, s = 0,...,n_{k-1}, j = 1,..,n_k$.
+To solve this, we introduce the popular backpropagation below.
+
+\paragraph{Error Back Propagation.}
+Since we know how to compute $\delta_N^t, t = 1,...,n_N$, we try to compute 
$\delta_{k}^j, j = 1,...,n_{k}$, given $\delta_{k+1}^t, t = 1,...,n_{k+1}$, for 
any $k = 1,...,N-1$.
+First,
+\[
+    \delta_{k}^j
+    = \frac{\partial f}{\partial \mathit{net}_{k}^j}
+    = \frac{\partial f}{\partial x_{k}^j} \cdot \frac{\partial 
x_{k}^j}{\partial \mathit{net}_{k}^j}
+    = \frac{\partial f}{\partial x_{k}^j} \cdot \phi'(\mathit{net}_{k}^j)
+\]
+And here comes the only equation that is needed but the author, I (Aaron), do 
not understand but it looks reasonable and repeats in different online notes 
\cite{mlp_gradient_wisc},
+\[\begin{alignedat}{5}
+    \frac{\partial f}{\partial x_{k}^j} = \sum_{t=1}^{n_{k+1}} \left( 
\frac{\partial f}{\partial \mathit{net}_{k+1}^t} \cdot \frac{\partial 
\mathit{net}_{k+1}^t}{\partial x_{k}^j} \right),
+    &\quad k = 1,...,N-1, \: j = 1,...,n_{k}
+\end{alignedat}\]
+Assuming the above equation is true, we can solve delta error backward 
iteratively
+\[\begin{aligned}
+    \delta_{k}^j
+    &= \frac{\partial f}{\partial x_{k}^j} \cdot \phi'(\mathit{net}_{k}^j) \\
+    &= \sum_{t=1}^{n_{k+1}} \left( \frac{\partial f}{\partial 
\mathit{net}_{k+1}^t} \cdot \frac{\partial \mathit{net}_{k+1}^t}{\partial 
x_{k}^j} \right) \cdot \phi'(\mathit{net}_{k}^j) \\
+    &= \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot \frac{\partial \left( 
\sum_{s=0}^{n_{k}} x_{k}^s u_{k}^{st} \right) }{\partial x_{k}^j} \right) \cdot 
\phi'(\mathit{net}_{k}^j) \\
+    &= \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot u_{k}^{jt} \right) 
\cdot \phi'(\mathit{net}_{k}^j) \\
+\end{aligned}\]
+To sum up, we need the following equation for error back propagation
+\[\boxed{\delta_{k}^j = \sum_{t=1}^{n_{k+1}} \left( \delta_{k+1}^t \cdot 
u_{k}^{jt} \right) \cdot \phi'(\mathit{net}_{k}^j)}\]
+where $k = 1,...,N-1$, and $j = 1,...,n_{k}$.
+
+\subsubsection{The $\mathit{Gradient}$ Function}
+\begin{algorithm}[mlp-gradient$(u, y, z)$] \label{alg:mlp-gradient}
+\alginput{Coefficients $u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 
0,...,n_{k-1}, \: j = 1,...,n_k\}$,\\
+start vector $y \in \mathbb{R}^{n_0}$,\\
+end vector $z \in \mathbb{R}^{n_N}$,\\
+activation unit $\phi : \mathbb{R} \to \mathbb{R}$}
+\algoutput{Gradient value $\nabla f(u)$ that consists of components $\nabla 
f(u)_{k-1}^{sj} = \frac{\partial f}{\partial u_{k-1}^{sj}}$}
+\begin{algorithmic}[1]
+    \State $(\mathit{net}, x) \set$ \texttt{feed-forward}$(u, y, \phi)$
+    \State $\delta_N \set$ \texttt{end-layer-delta-error}$(\mathit{net}, x, z, 
\phi')$
+    \State $\delta \set$ \texttt{error-back-propagation}$(\delta_N, 
\mathit{net}, u, \phi')$
+    \For{$k = 1,...,N$}
+        \For{$s = 0,...,n_{k-1}$}
+            \For{$j = 1,...,n_k$}
+                \State $\nabla f(u)_{k-1}^{sj} \set \delta_k^j x_{k-1}^s$
+                \Comment{Can be put together with the computation of delta 
$\delta$}
+            \EndFor
+        \EndFor
+    \EndFor
+    \State \Return $\nabla f(u)$
+\end{algorithmic}
+\end{algorithm}
+
+\paragraph{Activation Units $\phi$.}
+Common examples of activation units are
+\[\begin{alignedat}{3}
+\phi(\xi) &= \frac{1}{1 + e^{-\xi}}, &\quad \text{ (logistic function),}\\
+\phi(\xi) &= \frac{e^{\xi} - e^{-\xi}}{e^{\xi} + e^{-\xi}}, &\quad \text{ 
(hyperbolic tangent function)}\\
+\end{alignedat}\]
+
+\begin{algorithm}[feed-forward$(u, y, \phi)$] \label{alg:feed-forward}
+\alginput{Coefficients $u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 
0,...,n_{k-1}, \: j = 1,...,n_k\}$,\\
+input vector $y \in \mathbb{R}^{n_0}$,\\
+activation unit $\phi : \mathbb{R} \to \mathbb{R}$}
+\algoutput{Input vectors $\mathit{net} = \{\mathit{net}_k^j \; | \; k = 
1,...,N, \: j = 1,...,n_k\}$,\\
+output vectors $x = \{x_k^j \; | \; k = 0,...,N, \: j = 0,...,n_k\}$}
+\begin{algorithmic}[1]
+    \For{$k = 0,...,N$}
+        \State $x_k^0 \set 1$
+    \EndFor
+    \State $x_0 \set y$ \Comment{For all components $x_0^j, y^j, \; j = 
1,...,n_0$}
+    \For{$k = 1,...,N$}
+        \For{$j = 1,...,n_k$}
+            \State $\mathit{net}_k^j \set 0$
+            \For{$s = 0,...,n_{k-1}$}
+                \State $\mathit{net}_k^j \set \mathit{net}_k^j + x_{k-1}^s 
u_{k-1}^{sj}$
+            \EndFor
+            \State $x_k^j = \phi(\mathit{net}_k^j)$
+        \EndFor
+    \EndFor
+    \State \Return $(\mathit{net}, x)$
+\end{algorithmic}
+\end{algorithm}
+
+\begin{algorithm}[end-layer-delta-error$(\mathit{net}, x, z, \phi')$] 
\label{alg:end-layer-delta-error}
+\alginput{Input vectors $\mathit{net} = \{\mathit{net}_k^j \; | \; k = 
1,...,N, \: j = 1,...,n_k\}$,\\
+output vectors $x = \{x_k^j \; | \; k = 0,...,N, \: j = 0,...,n_k\}$,\\
+end vector $z \in \mathbb{R}^{n_N}$,\\
+derivative of activation unit $\phi' : \mathbb{R} \to \mathbb{R}$}
+\algoutput{End layer delta $\delta_N = \{\delta_N^t \; | \; t = 1,...,n_N\}$}
+\begin{algorithmic}[1]
+    \For{$t = 1,...,n_N$}
+            \State $\delta_N^t \set (x_N^t - z^t) \phi'(\mathit{net}_N^t)$
+    \EndFor
+    \State \Return $\delta_N$
+\end{algorithmic}
+\end{algorithm}
+
+\begin{algorithm}[error-back-propagation$(\delta_N, \mathit{net}, u, \phi')$] 
\label{alg:error-back-propagation}
+\alginput{End layer delta $\delta_N = \{\delta_N^t \; | \; t = 1,...,n_N\}$,\\
+input vectors $\mathit{net} = \{\mathit{net}_k^j \; | \; k = 1,...,N, \: j = 
1,...,n_k\}$,\\
+coefficients $u = \{ u_{k-1}^{sj} \; | \; k = 1,...,N, \: s = 0,...,n_{k-1}, 
\: j = 1,...,n_k\}$,\\
+derivative of activation unit $\phi' : \mathbb{R} \to \mathbb{R}$}
+\algoutput{Delta $\delta = \{\delta_k^j \; | \; k = 1,...,N, \: j = 
1,...,n_k\}$}
+\begin{algorithmic}[1]
+    \For{$k = N-1,...,1$}
+        \For{$j = 0,...,n_k$}
+            \State $\delta_k^j \set 0$
+            \For{$t = 1,...,n_{k+1}$}
+                \State $\delta_k^j \set \delta_k^j + \delta_{k+1}^t u_k^{jt}$
+            \EndFor
+            \State $\delta_k^j \set \delta_k^j \phi'(\mathit{net}_k^j)$
+        \EndFor
+    \EndFor
+    \State \Return $\delta$
+\end{algorithmic}
+\end{algorithm}

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/doc/literature.bib
----------------------------------------------------------------------
diff --git a/doc/literature.bib b/doc/literature.bib
index 08fb2dd..aadd65b 100644
--- a/doc/literature.bib
+++ b/doc/literature.bib
@@ -931,3 +931,21 @@ Applied Survival Analysis},
                Research, Asilomar, CA, USA, January 4-7, 2015, Online 
Proceedings},
   year      = {2015}
 }
+
+@misc{ann_wiki,
+    Url = {http://en.wikipedia.org/wiki/Artificial_neural_network},
+    Title = {Artificial neural network},
+    Author = {Wikipedia}
+}
+
+@misc{mlp_wiki,
+    Url = {http://en.wikipedia.org/wiki/Multilayer_perceptron},
+    Title = {Multilayer perceptron},
+    Author = {Wikipedia}
+}
+
+@misc{mlp_gradient_wisc,
+    Url = 
{http://homepages.cae.wisc.edu/~ece539/videocourse/notes/pdf/lec%2011%20MLP%20(3)%20BP.pdf},
+    Title = {{MLP(III): Back-Propagation}},
+    Author = {{Yu Hen Hu}}
+}

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/doc/mainpage.dox.in
----------------------------------------------------------------------
diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in
index 52afd48..f64de15 100644
--- a/doc/mainpage.dox.in
+++ b/doc/mainpage.dox.in
@@ -182,6 +182,9 @@ complete matrix stored as a distributed table.
     @defgroup grp_crf Conditional Random Field
     @ingroup grp_super
 
+    @defgroup grp_mlp Multilayer Perceptron
+    @ingroup grp_super
+
     @defgroup grp_regml Regression Models
     @ingroup grp_super
     A collection of methods for modeling conditional expectation of a response 
variable.
@@ -198,6 +201,7 @@ complete matrix stored as a distributed table.
         @defgroup grp_robust Robust Variance
     @}
 
+
     @defgroup grp_svm Support Vector Machines
     @ingroup grp_super
 

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/algo/loss.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/algo/loss.hpp b/src/modules/convex/algo/loss.hpp
index ddeee00..5075b21 100644
--- a/src/modules/convex/algo/loss.hpp
+++ b/src/modules/convex/algo/loss.hpp
@@ -1,4 +1,4 @@
-/* ----------------------------------------------------------------------- 
*//** 
+/* ----------------------------------------------------------------------- 
*//**
  *
  * @file loss.hpp
  *
@@ -21,7 +21,7 @@ namespace convex {
 
 // use Eigen
 using namespace madlib::dbal::eigen_integration;
-    
+
 // The reason for using ConstState instead of const State to reduce the
 // template type list: flexibility to high-level for mutability control
 // More: cast<ConstState>(MutableState) may not always work
@@ -42,7 +42,7 @@ void
 Loss<State, ConstState, Task>::transition(state_type &state,
         const tuple_type &tuple) {
     state.algo.loss += Task::loss(
-            state.task.model, 
+            state.task.model,
             tuple.indVar,
             tuple.depVar);
 }

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/convex.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/convex.hpp b/src/modules/convex/convex.hpp
index 117853a..66a9627 100644
--- a/src/modules/convex/convex.hpp
+++ b/src/modules/convex/convex.hpp
@@ -1,5 +1,5 @@
 /* 
-----------------------------------------------------------------------------
- * 
+ *
  * @file convex.hpp
  *
  * @brief Umbrella header that includes all convex task headers
@@ -10,4 +10,5 @@
 #include "utils_regularization.hpp"
 //#include "ridge_newton.hpp"
 #include "linear_svm_igd.hpp"
+#include "mlp_igd.hpp"
 

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/mlp_igd.cpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/mlp_igd.cpp b/src/modules/convex/mlp_igd.cpp
new file mode 100644
index 0000000..3647d5f
--- /dev/null
+++ b/src/modules/convex/mlp_igd.cpp
@@ -0,0 +1,237 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ * @file mlp_igd.cpp
+ *
+ * @brief Multilayer Perceptron functions
+ *
+ *//* ----------------------------------------------------------------------- 
*/
+#include <boost/lexical_cast.hpp>
+
+#include <dbconnector/dbconnector.hpp>
+#include <modules/shared/HandleTraits.hpp>
+
+#include "mlp_igd.hpp"
+
+#include "task/mlp.hpp"
+#include "algo/igd.hpp"
+#include "algo/loss.hpp"
+
+#include "type/tuple.hpp"
+#include "type/model.hpp"
+#include "type/state.hpp"
+
+namespace madlib {
+
+namespace modules {
+
+namespace convex {
+
+// These 2 classes contain public static methods that can be called
+typedef IGD<MLPIGDState<MutableArrayHandle<double> >, 
MLPIGDState<ArrayHandle<double> >,
+        MLP<MLPModel<MutableArrayHandle<double> >, MLPTuple > > 
MLPIGDAlgorithm;
+
+typedef Loss<MLPIGDState<MutableArrayHandle<double> >, 
MLPIGDState<ArrayHandle<double> >,
+        MLP<MLPModel<MutableArrayHandle<double> >, MLPTuple > > 
MLPLossAlgorithm;
+
+typedef MLP<MLPModel<MutableArrayHandle<double> >,MLPTuple> MLPTask;
+
+/**
+ * @brief Perform the multilayer perceptron transition step
+ *
+ * Called for each tuple.
+ */
+AnyType
+mlp_igd_transition::run(AnyType &args) {
+    // For the first tuple: args[0] is nothing more than a marker that
+    // indicates that we should do some initial operations.
+    // For other tuples: args[0] holds the computation state until last tuple
+    MLPIGDState<MutableArrayHandle<double> > state = args[0];
+
+    // initilize the state if first tuple
+    if (state.algo.numRows == 0) {
+        if (!args[3].isNull()) {
+            MLPIGDState<ArrayHandle<double> > previousState = args[3];
+
+            state.allocate(*this, previousState.task.numberOfStages,
+                           previousState.task.numbersOfUnits);
+            state = previousState;
+        } else {
+            // configuration parameters
+            ArrayHandle<double> numbersOfUnits = 
args[4].getAs<ArrayHandle<double> >();
+
+            double stepsize = args[5].getAs<double>();
+
+            state.allocate(*this, numbersOfUnits.size() - 1,
+                           reinterpret_cast<const double 
*>(numbersOfUnits.ptr()));
+            state.task.stepsize = stepsize;
+
+
+            int activation = args[6].getAs<int>();
+
+            int is_classification = args[7].getAs<int>();
+            state.task.model.initialize(is_classification, activation);
+        }
+
+        // resetting in either case
+        state.reset();
+    }
+
+    // meta data
+    const uint16_t N = state.task.numberOfStages;
+    const double *n = state.task.numbersOfUnits;
+
+    // tuple
+    MappedColumnVector indVar;
+    MappedColumnVector depVar;
+    try {
+        // an exception is raised in the backend if args[2] contains nulls
+        MappedColumnVector x = args[1].getAs<MappedColumnVector>();
+        // x is a const reference, we can only rebind to change its pointer
+        indVar.rebind(x.memoryHandle(), x.size());
+        MappedColumnVector y = args[2].getAs<MappedColumnVector>();
+        depVar.rebind(y.memoryHandle(), y.size());
+
+    } catch (const ArrayWithNullException &e) {
+        return args[0];
+    }
+    MLPTuple tuple;
+    tuple.indVar.rebind(indVar.memoryHandle(), indVar.size());
+    tuple.depVar.rebind(depVar.memoryHandle(), depVar.size());
+
+    // Now do the transition step
+    MLPIGDAlgorithm::transition(state, tuple);
+    MLPLossAlgorithm::transition(state, tuple);
+    state.algo.numRows ++;
+
+    return state;
+}
+
+/**
+ * @brief Perform the perliminary aggregation function: Merge transition states
+ */
+AnyType
+mlp_igd_merge::run(AnyType &args) {
+    MLPIGDState<MutableArrayHandle<double> > stateLeft = args[0];
+    MLPIGDState<ArrayHandle<double> > stateRight = args[1];
+
+    // We first handle the trivial case where this function is called with one
+    // of the states being the initial state
+    if (stateLeft.algo.numRows == 0) { return stateRight; }
+    else if (stateRight.algo.numRows == 0) { return stateLeft; }
+
+    // Merge states together
+    MLPIGDAlgorithm::merge(stateLeft, stateRight);
+    MLPLossAlgorithm::merge(stateLeft, stateRight);
+    // The following numRows update, cannot be put above, because the model
+    // averaging depends on their original values
+    stateLeft.algo.numRows += stateRight.algo.numRows;
+
+    return stateLeft;
+}
+
+/**
+ * @brief Perform the multilayer perceptron final step
+ */
+AnyType
+mlp_igd_final::run(AnyType &args) {
+    // We request a mutable object. Depending on the backend, this might 
perform
+    // a deep copy.
+    MLPIGDState<MutableArrayHandle<double> > state = args[0];
+
+    // Aggregates that haven't seen any data just return Null.
+    if (state.algo.numRows == 0) { return Null(); }
+
+    // finalizing
+    MLPIGDAlgorithm::final(state);
+
+    // Return the mean loss
+    state.algo.loss = state.algo.loss/static_cast<double>(state.algo.numRows);
+
+    // for stepsize tuning
+    std::stringstream debug;
+    debug << "loss: " << state.algo.loss;
+    elog(INFO,"%s",debug.str().c_str());
+    return state;
+}
+
+/**
+ * @brief Return the difference in RMSE between two states
+ */
+AnyType
+internal_mlp_igd_distance::run(AnyType &args) {
+    MLPIGDState<ArrayHandle<double> > stateLeft = args[0];
+    MLPIGDState<ArrayHandle<double> > stateRight = args[1];
+    return std::abs(stateLeft.algo.loss - stateRight.algo.loss);
+}
+
+/**
+ * @brief Return the coefficients and diagnostic statistics of the state
+ */
+AnyType
+internal_mlp_igd_result::run(AnyType &args) {
+    MLPIGDState<ArrayHandle<double> > state = args[0];
+
+    HandleTraits<ArrayHandle<double> >::ColumnVectorTransparentHandleMap
+        flattenU;
+    flattenU.rebind(&state.task.model.u[0](0, 0),
+            state.task.model.arraySize(state.task.numberOfStages,
+                    state.task.numbersOfUnits)-2); // -2 for is_classification 
and activation
+    double loss = state.algo.loss;
+
+
+    AnyType tuple;
+    tuple << flattenU
+          << loss;
+    return tuple;
+}
+
+AnyType
+internal_predict_mlp::run(AnyType &args) {
+    MLPModel<MutableArrayHandle<double> > model;
+    MappedColumnVector indVar;
+    int is_response = args[5].getAs<int>();
+    MappedColumnVector coeff = args[0].getAs<MappedColumnVector>();
+    MappedColumnVector layerSizes = args[4].getAs<MappedColumnVector>();
+    // Input layer doesn't count
+    size_t numberOfStages = layerSizes.size()-1;
+    //#TODO this should be an int not a double
+    double is_classification = args[2].getAs<double>();
+    double activation = args[3].getAs<double>();
+    bool get_class = is_classification && is_response;
+
+    
model.rebind(&is_classification,&activation,&coeff.data()[0],numberOfStages,&layerSizes.data()[0]);
+    try {
+        MappedColumnVector x = args[1].getAs<MappedColumnVector>();
+        // x is a const reference, we can only rebind to change its pointer
+        indVar.rebind(x.memoryHandle(), x.size());
+    } catch (const ArrayWithNullException &e) {
+        return args[0];
+    }
+    ColumnVector prediction = MLPTask::predict(model, indVar, get_class);
+
+    return prediction;
+}
+
+
+} // namespace convex
+
+} // namespace modules
+
+} // namespace madlib
+

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/mlp_igd.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/mlp_igd.hpp b/src/modules/convex/mlp_igd.hpp
new file mode 100644
index 0000000..c957d97
--- /dev/null
+++ b/src/modules/convex/mlp_igd.hpp
@@ -0,0 +1,56 @@
+/* ----------------------------------------------------------------------- 
*//**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ *
+ * @file mlp_igd.hpp
+ *
+ *//* ----------------------------------------------------------------------- 
*/
+
+/**
+ * @brief Multilayer perceptron (incremental gradient): Transition function
+ */
+DECLARE_UDF(convex, mlp_igd_transition)
+
+/**
+ * @brief Multilayer perceptron (incremental gradient): State merge function
+ */
+DECLARE_UDF(convex, mlp_igd_merge)
+
+/**
+ * @brief Multilayer perceptron (incremental gradient): Final function
+ */
+DECLARE_UDF(convex, mlp_igd_final)
+
+/**
+ * @brief Multilayer perceptron (incremental gradient): Difference in
+ *     log-likelihood between two transition states
+ */
+DECLARE_UDF(convex, internal_mlp_igd_distance)
+
+/**
+ * @brief Multilayer perceptron (incremental gradient): Convert
+ *     transition state to result tuple
+ */
+DECLARE_UDF(convex, internal_mlp_igd_result)
+
+/**
+ * @brief Multilayer perceptron (incremental gradient): Predict
+ *      function for regression and classification probability
+ */
+
+DECLARE_UDF(convex, internal_predict_mlp)

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/task/mlp.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/task/mlp.hpp b/src/modules/convex/task/mlp.hpp
new file mode 100644
index 0000000..e66492b
--- /dev/null
+++ b/src/modules/convex/task/mlp.hpp
@@ -0,0 +1,332 @@
+/* ----------------------------------------------------------------------- 
*//**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ * @file mlp.hpp
+ *
+ * This file contains objective function related computation, which is called
+ * by classes in algo/, e.g.,  loss, gradient functions
+ *
+ *//* ----------------------------------------------------------------------- 
*/
+
+#ifndef MADLIB_MODULES_CONVEX_TASK_MLP_HPP_
+#define MADLIB_MODULES_CONVEX_TASK_MLP_HPP_
+
+namespace madlib {
+
+namespace modules {
+
+namespace convex {
+
+// Use Eigen
+using namespace madlib::dbal::eigen_integration;
+
+template <class Model, class Tuple>
+class MLP {
+public:
+    typedef Model model_type;
+    typedef Tuple tuple_type;
+    typedef typename Tuple::independent_variables_type
+        independent_variables_type;
+    typedef typename Tuple::dependent_variable_type dependent_variable_type;
+
+    static void gradientInPlace(
+            model_type                          &model,
+            const independent_variables_type    &y,
+            const dependent_variable_type       &z,
+            const double                        &stepsize);
+
+    static double loss(
+            const model_type                    &model,
+            const independent_variables_type    &y,
+            const dependent_variable_type       &z);
+
+    static ColumnVector predict(
+            const model_type                    &model,
+            const independent_variables_type    &y,
+            const bool                          get_class);
+
+    const static int RELU = 0;
+    const static int SIGMOID = 1;
+    const static int TANH = 2;
+
+    static double sigmoid(const double &xi) {
+        return 1. / (1. + std::exp(-xi));
+    }
+
+    static double relu(const double &xi) {
+        return xi*(xi>0);
+    }
+
+    static double tanh(const double &xi) {
+        return std::tanh(xi);
+    }
+
+
+private:
+
+    static double sigmoidDerivative(const double &xi) {
+        double value = sigmoid(xi);
+        return value * (1. - value);
+    }
+
+    static double reluDerivative(const double &xi) {
+        return xi>0;
+    }
+
+    static double tanhDerivative(const double &xi) {
+        double value = tanh(xi);
+        return 1-value*value;
+    }
+
+    static void feedForward(
+            const model_type                    &model,
+            const independent_variables_type    &y,
+            std::vector<ColumnVector>           &net,
+            std::vector<ColumnVector>           &x);
+
+    static void endLayerDeltaError(
+            const std::vector<ColumnVector>     &net,
+            const std::vector<ColumnVector>     &x,
+            const dependent_variable_type       &z,
+            ColumnVector                        &delta_N);
+
+    static void errorBackPropagation(
+            const ColumnVector                  &delta_N,
+            const std::vector<ColumnVector>     &net,
+            const model_type                    &model,
+            std::vector<ColumnVector>           &delta);
+};
+
+template <class Model, class Tuple>
+void
+MLP<Model, Tuple>::gradientInPlace(
+        model_type                          &model,
+        const independent_variables_type    &y,
+        const dependent_variable_type       &z,
+        const double                        &stepsize) {
+    (void) model;
+    (void) z;
+    (void) y;
+    (void) stepsize;
+    std::vector<ColumnVector> net;
+    std::vector<ColumnVector> x;
+    std::vector<ColumnVector> delta;
+    ColumnVector delta_N;
+
+    feedForward(model, y, net, x);
+    endLayerDeltaError(net, x, z, delta_N);
+    errorBackPropagation(delta_N, net, model, delta);
+
+    uint16_t N = model.u.size(); // assuming nu. of layers >= 1
+    uint16_t k, s, j;
+
+    std::vector<uint16_t> n; n.clear(); //nu. of units in each layer
+
+    n.push_back(model.u[0].rows() - 1);
+    for (k = 1; k <= N; k ++) {
+        n.push_back(model.u[k-1].cols() - 1);
+    }
+
+    for (k=1; k <= N; k++){
+        for (s=0; s <= n[k-1]; s++){
+            for (j=1; j <= n[k]; j++){
+                model.u[k-1](s,j) -= stepsize *  (delta[k](j) * x[k-1](s));
+            }
+        }
+    }
+}
+
+template <class Model, class Tuple>
+double
+MLP<Model, Tuple>::loss(
+        const model_type                    &model,
+        const independent_variables_type    &y,
+        const dependent_variable_type       &z) {
+    // Here we compute the loss. In the case of regression we use sum of 
square errors
+    // In the case of classification the loss term is cross entropy.
+    std::vector<ColumnVector> net;
+    std::vector<ColumnVector> x;
+
+    feedForward(model, y, net, x);
+    double loss = 0.;
+    uint16_t j;
+
+    for (j = 1; j < z.rows() + 1; j ++) {
+        if(model.is_classification){
+            // Cross entropy: RHS term is negative
+            loss -= z(j-1)*std::log(x.back()(j)) + 
(1-z(j-1))*std::log(1-x.back()(j));
+        }else{
+            double diff = x.back()(j) - z(j-1);
+            loss += diff * diff;
+        }
+    }
+    if(!model.is_classification){
+        loss /= 2.;
+    }else{
+        loss /= z.rows();
+    }
+    return loss;
+}
+
+template <class Model, class Tuple>
+ColumnVector
+MLP<Model, Tuple>::predict(
+        const model_type                    &model,
+        const independent_variables_type    &y,
+        const bool                          get_class
+        ) {
+    (void) model;
+    (void) y;
+    std::vector<ColumnVector> net;
+    std::vector<ColumnVector> x;
+
+    feedForward(model, y, net, x);
+    // Don't return the offset
+    ColumnVector output = x.back().tail(x.back().size()-1);
+    if(get_class){
+        int max_idx;
+        output.maxCoeff(&max_idx);
+        output.resize(1);
+        output[0] = (double)max_idx;
+    }
+    return output;
+}
+
+
+template <class Model, class Tuple>
+void
+MLP<Model, Tuple>::feedForward(
+        const model_type                    &model,
+        const independent_variables_type    &y,
+        std::vector<ColumnVector>           &net,
+        std::vector<ColumnVector>           &x){
+    // meta data and x_k^0 = 1
+    uint16_t k, j, s;
+    uint16_t N = model.u.size(); // assuming >= 1
+    net.resize(N + 1);
+    x.resize(N + 1);
+
+    std::vector<uint16_t> n; n.clear();
+    n.push_back(model.u[0].rows() - 1);
+    x[0].resize(n[0] + 1);
+    x[0](0) = 1.;
+    for (k = 1; k <= N; k ++) {
+        n.push_back(model.u[k-1].cols() - 1);
+        net[k].resize(n[k] + 1);
+        x[k].resize(n[k] + 1);
+        // Bias
+        x[k](0) = 1.;
+    }
+
+    // y is a mapped parameter from DB, aligning with x here
+    for (j = 1; j <= n[0]; j ++) { x[0](j) = y(j-1); }
+
+    for (k = 1; k < N; k ++) {
+        for (j = 1; j <= n[k]; j ++) {
+            net[k](j) = 0.;
+            for (s = 0; s <= n[k-1]; s ++) {
+                net[k](j) += x[k-1](s) * model.u[k-1](s, j);
+            }
+            if(model.activation==RELU)
+                x[k](j) = relu(net[k](j));
+            else if(model.activation==SIGMOID)
+                x[k](j) = sigmoid(net[k](j));
+            else
+                x[k](j) = tanh(net[k](j));
+        }
+    }
+
+    // output layer computation
+    for (j = 1; j <= n[N]; j ++) {
+        x[N](j) = 0.;
+        for (s = 0; s <= n[N-1]; s ++) {
+            x[N](j) += x[N-1](s) * model.u[N-1](s, j);
+        }
+    }
+    // Numerically stable calculation of softmax
+    ColumnVector last_x = x[N].tail(n[N]);
+    if(model.is_classification){
+        double max_x = last_x.maxCoeff();
+        last_x = (last_x.array() - max_x).exp();
+        last_x /= last_x.sum();
+    }
+    x[N].tail(n[N]) = last_x;
+}
+
+template <class Model, class Tuple>
+void
+MLP<Model, Tuple>::endLayerDeltaError(
+        const std::vector<ColumnVector>     &net,
+        const std::vector<ColumnVector>     &x,
+        const dependent_variable_type       &z,
+        ColumnVector                        &delta_N) {
+    //meta data
+    uint16_t t;
+    uint16_t N = x.size() - 1; // assuming >= 1
+    uint16_t n_N = x[N].rows() - 1;
+    delta_N.resize(n_N + 1);
+
+    for (t = 1; t <= n_N; t ++) {
+               delta_N(t) = (x[N](t) - z(t-1));
+    }
+}
+
+template <class Model, class Tuple>
+void
+MLP<Model, Tuple>::errorBackPropagation(
+        const ColumnVector                  &delta_N,
+        const std::vector<ColumnVector>     &net,
+        const model_type                    &model,
+        std::vector<ColumnVector>           &delta) {
+    // meta data
+    uint16_t k, j, t;
+    uint16_t N = model.u.size(); // assuming >= 1
+    delta.resize(N + 1);
+
+    std::vector<uint16_t> n; n.clear();
+    n.push_back(model.u[0].rows() - 1);
+    for (k = 1; k <= N; k ++) {
+        n.push_back(model.u[k-1].cols() - 1);
+        delta[k].resize(n[k]+1);
+    }
+    delta[N] = delta_N;
+
+    for (k = N - 1; k >= 1; k --) {
+        for (j = 0; j <= n[k]; j ++) {
+            delta[k](j) = 0.;
+            for (t = 1; t <= n[k+1]; t ++) {
+                delta[k](j) += delta[k+1](t) * model.u[k](j, t);
+            }
+            if(model.activation==RELU)
+                delta[k](j) = delta[k](j) * reluDerivative(net[k](j));
+            else if(model.activation==SIGMOID)
+                delta[k](j) = delta[k](j) * sigmoidDerivative(net[k](j));
+            else
+                delta[k](j) = delta[k](j) * tanhDerivative(net[k](j));
+        }
+    }
+}
+
+} // namespace convex
+
+} // namespace modules
+
+} // namespace madlib
+
+#endif
+

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/type/model.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/type/model.hpp 
b/src/modules/convex/type/model.hpp
index 953fc2e..9b68af8 100644
--- a/src/modules/convex/type/model.hpp
+++ b/src/modules/convex/type/model.hpp
@@ -97,6 +97,146 @@ struct LMFModel {
 typedef HandleTraits<MutableArrayHandle<double> 
>::ColumnVectorTransparentHandleMap
     GLMModel;
 
+// The necessity of this wrapper is to allow classes in algo/ and task/ to
+// have a type that they can template over
+template <class Handle>
+struct MLPModel {
+    typename HandleTraits<Handle>::ReferenceToUInt16 is_classification;
+    typename HandleTraits<Handle>::ReferenceToUInt16 activation;
+    std::vector<Eigen::Map<Matrix > > u;
+
+    /**
+     * @brief Space needed.
+     *
+     * Extra information besides the values in the matrix, like dimension is
+     * necessary for a matrix, so that it can perform operations. These are
+     * stored in the HandleMap.
+     */
+    static inline uint32_t arraySize(const uint16_t &inNumberOfStages,
+                                     const double *inNumbersOfUnits) {
+        // inNumberOfStages == 0 is not an expected value, but
+        // it won't cause exception -- returning 0
+        uint32_t size = 0;
+        size_t N = inNumberOfStages;
+        const double *n = inNumbersOfUnits;
+        size_t k;
+        for (k = 1; k <= N; k ++) {
+            size += (n[k-1] + 1) * (n[k] + 1);
+        }
+        return 1 +       // is_classification
+               1 +       // activation
+               size;     // weights (u)
+    }
+
+    /**
+     * @brief Initialize the model randomly
+     */
+    void initialize(int is_classification_in, int activation_in) {
+        is_classification = is_classification_in;
+        activation = activation_in;
+        // using madlib::dbconnector::$database::NativeRandomNumberGenerator
+        NativeRandomNumberGenerator rng;
+
+        // Scaling factor for weight initialization
+        double epsilon = 0.0001;
+
+
+        double base = rng.min();
+        double span = rng.max() - base;
+
+        uint16_t N = u.size(); // assuming nu. of layers >= 1
+        uint16_t k, s, j;
+
+        std::vector<uint16_t> n; n.clear(); //nu. of units in each layer
+
+        n.push_back(u[0].rows() - 1);
+        for (k = 1; k <= N; k ++) {
+            n.push_back(u[k-1].cols() - 1);
+        }
+
+        for (k=1; k <= N; k++){
+            for (s=0; s <= n[k-1]; s++){
+                u[k-1](s,0)=1;
+                for (j=1; j <= n[k]; j++){
+                    // Generate normal(0,epsilon) value using Box-Muller 
transform
+                    double u1 = (rng()-base)/span;
+                    double u2 = (rng()-base)/span;
+                    double z = std::sqrt(-2*std::log(u1))*std::cos(2*M_PI*u2);
+                    u[k-1](s,j) = epsilon*z;
+                }
+            }
+        }
+    }
+
+    uint32_t rebind(const double *is_classification_in,
+                    const double *activation_in,
+                    const double *data,
+                    const uint16_t &inNumberOfStages,
+                    const double *inNumbersOfUnits) {
+        size_t N = inNumberOfStages;
+        const double *n = inNumbersOfUnits;
+        size_t k;
+
+        is_classification.rebind(is_classification_in);
+        activation.rebind(activation_in);
+
+        uint32_t sizeOfU = 0;
+        u.clear();
+        for (k = 1; k <= N; k ++) {
+            u.push_back(Eigen::Map<Matrix >(
+                    const_cast<double*>(data + sizeOfU),
+                    n[k-1] + 1, n[k] + 1));
+            sizeOfU += (n[k-1] + 1) * (n[k] + 1);
+        }
+
+        return sizeOfU;
+    }
+
+    /*
+     *  Some operator wrappers for u.
+     */
+    MLPModel &operator*=(const double &c) {
+        size_t k;
+        for (k = 1; k <= u.size(); k ++) {
+            u[k-1] *= c;
+        }
+
+        return *this;
+    }
+
+    template<class OtherHandle>
+    MLPModel &operator-=(const MLPModel<OtherHandle> &inOtherModel) {
+        size_t k;
+        for (k = 1; k <= u.size() && k <= inOtherModel.u.size(); k ++) {
+            u[k-1] -= inOtherModel.u[k-1];
+        }
+
+        return *this;
+    }
+
+    template<class OtherHandle>
+    MLPModel &operator+=(const MLPModel<OtherHandle> &inOtherModel) {
+        size_t k;
+        for (k = 1; k <= u.size() && k <= inOtherModel.u.size(); k ++) {
+            u[k-1] += inOtherModel.u[k-1];
+        }
+
+        return *this;
+    }
+
+    template<class OtherHandle>
+    MLPModel &operator=(const MLPModel<OtherHandle> &inOtherModel) {
+        size_t k;
+        for (k = 1; k <= u.size() && k <= inOtherModel.u.size(); k ++) {
+            u[k-1] = inOtherModel.u[k-1];
+        }
+        is_classification = inOtherModel.is_classification;
+        activation = inOtherModel.activation;
+
+        return *this;
+    }
+};
+
 } // namespace convex
 
 } // namespace modules

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/type/state.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/type/state.hpp 
b/src/modules/convex/type/state.hpp
index a061fdd..66f5023 100644
--- a/src/modules/convex/type/state.hpp
+++ b/src/modules/convex/type/state.hpp
@@ -64,14 +64,14 @@ public:
                 dbal::DoZero, dbal::ThrowBadAlloc>(
                 arraySize(inRowDim, inColDim, inMaxRank));
 
-        // This rebind is totally for the following 3 lines of code to take
+        // This rebind is for the following 3 lines of code to take
         // effect. I can also do something like "mStorage[0] = inRowDim",
         // but I am not clear about the type casting
         rebind();
         task.rowDim = inRowDim;
         task.colDim = inColDim;
         task.maxRank = inMaxRank;
-        
+
         // This time all the member fields are correctly binded
         rebind();
     }
@@ -177,7 +177,7 @@ public:
 /**
  * @brief Inter- (Task State) and intra-iteration (Algo State) state of
  *        incremental gradient descent for generalized linear models
- * 
+ *
  * Generalized Linear Models (GLMs): Logistic regression, Linear SVM
  *
  * TransitionState encapsualtes the transition state during the
@@ -219,7 +219,7 @@ public:
 
         task.dimension.rebind(&mStorage[0]);
         task.dimension = inDimension;
-        
+
         rebind();
     }
 
@@ -298,7 +298,7 @@ public:
 /**
  * @brief Inter- (Task State) and intra-iteration (Algo State) state of
  *        Conjugate Gradient for generalized linear models
- * 
+ *
  * Generalized Linear Models (GLMs): Logistic regression, Linear SVM
  *
  * TransitionState encapsualtes the transition state during the
@@ -340,7 +340,7 @@ public:
 
         task.dimension.rebind(&mStorage[0]);
         task.dimension = inDimension;
-        
+
         rebind();
     }
 
@@ -427,9 +427,9 @@ public:
  * @brief Inter- (Task State) and intra-iteration (Algo State) state of
  *        Newton's method for generic objective functions (any tasks)
  *
- * This class assumes that the coefficients are of type vector. Low-rank 
+ * This class assumes that the coefficients are of type vector. Low-rank
  * matrix factorization, neural networks would not be able to use this.
- * 
+ *
  * TransitionState encapsualtes the transition state during the
  * aggregate function during an iteration. To the database, the state is
  * exposed as a single DOUBLE PRECISION array, to the C++ code it is a proper
@@ -469,7 +469,7 @@ public:
 
         task.dimension.rebind(&mStorage[0]);
         task.dimension = inDimension;
-        
+
         rebind();
     }
 
@@ -541,6 +541,161 @@ public:
     } algo;
 };
 
+
+/**
+ * @brief Inter- (Task State) and intra-iteration (Algo State) state of
+ *        incremental gradient descent for multilayer perceptron
+ *
+ * TransitionState encapsualtes the transition state during the
+ * aggregate function during an iteration. To the database, the state is
+ * exposed as a single DOUBLE PRECISION array, to the C++ code it is a proper
+ * object containing scalars and vectors.
+ *
+ * Note: We assume that the DOUBLE PRECISION array is initialized by the
+ * database with length at least 6, and at least first elemenet
+ * is 0 (exact values of other elements are ignored).
+ *
+ */
+template <class Handle>
+class MLPIGDState {
+    template <class OtherHandle>
+    friend class MLPIGDState;
+
+public:
+    MLPIGDState(const AnyType &inArray) : mStorage(inArray.getAs<Handle>()) {
+        rebind();
+    }
+
+    /**
+     * @brief Convert to backend representation
+     *
+     * We define this function so that we can use State in the
+     * argument list and as a return type.
+     */
+    inline operator AnyType() const {
+        return mStorage;
+    }
+
+    /**
+     * @brief Allocating the incremental gradient state.
+     */
+    inline void allocate(const Allocator &inAllocator,
+                         const uint16_t &inNumberOfStages,
+                         const double *inNumbersOfUnits) {
+        mStorage = inAllocator.allocateArray<double, dbal::AggregateContext,
+                dbal::DoZero, dbal::ThrowBadAlloc>(
+                        arraySize(inNumberOfStages, inNumbersOfUnits));
+
+        // This rebind is for the following lines of code to take
+        // effect. I can also do something like "mStorage[0] = N",
+        // but I am not clear about the type binding/alignment
+        rebind();
+        task.numberOfStages = inNumberOfStages;
+        uint16_t N = inNumberOfStages;
+        uint16_t k;
+        for (k = 0; k <= N; k ++) {
+            task.numbersOfUnits[k] = inNumbersOfUnits[k];
+        }
+
+        // This time all the member fields are correctly binded
+        rebind();
+    }
+
+    /**
+     * @brief We need to support assigning the previous state
+     */
+    template <class OtherHandle>
+    MLPIGDState &operator=(const MLPIGDState<OtherHandle> &inOtherState) {
+        for (size_t i = 0; i < mStorage.size(); i++) {
+            mStorage[i] = inOtherState.mStorage[i];
+        }
+
+        return *this;
+    }
+
+    /**
+     * @brief Reset the intra-iteration fields.
+     */
+    inline void reset() {
+        algo.numRows = 0;
+        algo.loss = 0.;
+        algo.incrModel = task.model;
+    }
+
+    static inline uint32_t arraySize(const uint16_t &inNumberOfStages,
+                                     const double *inNumbersOfUnits) {
+        uint32_t sizeOfModel =
+            MLPModel<Handle>::arraySize(inNumberOfStages, inNumbersOfUnits);
+        return 1                        // numberOfStages = N
+            + (inNumberOfStages + 1)    // numbersOfUnits: size is (N + 1)
+            + 1                         // stepsize
+            + sizeOfModel               // model
+
+            + 1                         // numRows
+            + 1                         // loss
+            + sizeOfModel;              // incrModel
+    }
+
+private:
+    /**
+     * @brief Rebind to a new storage array.
+     *
+     * Array layout (iteration refers to one aggregate-function call):
+     * Inter-iteration components (updated in final function):
+     * - 0: numberOfStages (number of stages (layers), design doc: N)
+     * - 1: numbersOfUnits (numbers of activation units, design doc: 
n_0,...,n_N)
+     * - N + 2: stepsize (step size of gradient steps)
+     * - N + 3: is_classification (do classification)
+     * - N + 4: activation (activation function)
+     * - N + 5: coeff (coefficients, design doc: u)
+     *
+     * Intra-iteration components (updated in transition step):
+     *   sizeOfModel = # of entries in u + 2, (\sum_1^N n_{k-1} n_k)
+     * - N + 3 + sizeOfModel: numRows (number of rows processed in this 
iteration)
+     * - N + 4 + sizeOfModel: loss (loss value, the sum of squared errors)
+     * - N + 5 + sizeOfModel: is_classification (do classification)
+     * - N + 6 + sizeOfModel: activation (activation function)
+     * - N + 7 + sizeOfModel: coeff (volatile model for incrementally update)
+     */
+    void rebind() {
+        task.numberOfStages.rebind(&mStorage[0]);
+        size_t N = task.numberOfStages;
+        task.numbersOfUnits =
+            reinterpret_cast<dimension_pointer_type>(&mStorage[1]);
+        task.stepsize.rebind(&mStorage[N + 2]);
+        uint32_t sizeOfModel = task.model.rebind(&mStorage[N + 3],&mStorage[N 
+ 4],&mStorage[N + 5],
+                task.numberOfStages, task.numbersOfUnits);
+
+        algo.numRows.rebind(&mStorage[N + 5 + sizeOfModel]);
+        algo.loss.rebind(&mStorage[N + 6 + sizeOfModel]);
+        algo.incrModel.rebind(&mStorage[N + 3],&mStorage[N + 4],&mStorage[N + 
7 + sizeOfModel],
+                task.numberOfStages, task.numbersOfUnits);
+
+    }
+
+    Handle mStorage;
+
+    typedef typename HandleTraits<Handle>::ReferenceToUInt16 dimension_type;
+    typedef typename HandleTraits<Handle>::DoublePtr dimension_pointer_type;
+    typedef typename HandleTraits<Handle>::ReferenceToUInt64 count_type;
+    typedef typename HandleTraits<Handle>::ReferenceToDouble numeric_type;
+
+public:
+    struct TaskState {
+        dimension_type numberOfStages;
+        dimension_pointer_type numbersOfUnits;
+        numeric_type stepsize;
+        MLPModel<Handle> model;
+    } task;
+
+    struct AlgoState {
+        count_type numRows;
+        numeric_type loss;
+        MLPModel<Handle> incrModel;
+    } algo;
+};
+
+
 } // namespace convex
 
 } // namespace modules

http://git-wip-us.apache.org/repos/asf/incubator-madlib/blob/4fcb60ed/src/modules/convex/type/tuple.hpp
----------------------------------------------------------------------
diff --git a/src/modules/convex/type/tuple.hpp 
b/src/modules/convex/type/tuple.hpp
index 7ddb1b7..4b9c55e 100644
--- a/src/modules/convex/type/tuple.hpp
+++ b/src/modules/convex/type/tuple.hpp
@@ -15,12 +15,17 @@
 
 #include "dependent_variable.hpp"
 
+#include <dbconnector/dbconnector.hpp>
+
 namespace madlib {
 
 namespace modules {
 
 namespace convex {
 
+// Use Eigen
+using namespace madlib::dbal::eigen_integration;
+
 template <class IndependentVariables, class DependentVariable>
 struct ExampleTuple {
     typedef IndependentVariables independent_variables_type;
@@ -59,6 +64,8 @@ typedef ExampleTuple<MappedColumnVector, double> GLMTuple;
 // madlib::modules::convex::MatrixIndex
 typedef ExampleTuple<MatrixIndex, double> LMFTuple;
 
+typedef ExampleTuple<MappedColumnVector, MappedColumnVector> MLPTuple;
+
 } // namespace convex
 
 } // namespace modules

[3/3] incubator-madlib git commit: New module: Multilayer Perceptron

Reply via email to