incubator-systemml git commit: [SYSTEMML-1606] Update notebook samples with latest code

acs_s Fri, 12 May 2017 01:12:58 -0700

Repository: incubator-systemml
Updated Branches:
  refs/heads/master a27663946 -> 7fe372b9a



[SYSTEMML-1606] Update notebook samples with latest code


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/7fe372b9
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/7fe372b9
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/7fe372b9

Branch: refs/heads/master
Commit: 7fe372b9adcb91d6ac5fb22646f33e488d29cf2a
Parents: a276639
Author: Arvind Surve <[email protected]>
Authored: Fri May 12 01:11:51 2017 -0700
Committer: Arvind Surve <[email protected]>
Committed: Fri May 12 01:11:51 2017 -0700

----------------------------------------------------------------------
 .../Deep Learning Image Classification.ipynb    | 489 -------------------
 .../Deep_Learning_Image_Classification.ipynb    | 406 +++++++++++++++
 2 files changed, 406 insertions(+), 489 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/7fe372b9/samples/jupyter-notebooks/Deep
 Learning Image Classification.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb 
b/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb
deleted file mode 100644
index eac20fd..0000000
--- a/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb        
+++ /dev/null
@@ -1,489 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Deep Learning Image Classification\n",
-    "\n",
-    "This notebook shows SystemML Deep Learning functionality to map images of 
single digit numbers to their corresponding numeric representations. See 
[Getting Started with Deep Learning and 
Python](http://www.pyimagesearch.com/2014/09/22/getting-started-deep-learning-python/)
 for an explanation of the used deep learning concepts and assumptions.\n",
-    "\n",
-    "The downloaded MNIST dataset contains labeled images of handwritten 
digits, where each example is a 28x28 pixel image of grayscale values in the 
range [0,255] stretched out as 784 pixels, and each label is one of 10 possible 
digits in [0,9]. We download 60,000 training examples, and 10,000 test 
examples, where the format is \"label, pixel_1, pixel_2, ..., pixel_n\". We 
train a SystemML LeNet model. The results of the learning algorithms have an 
accuracy of 98 percent.\n",
-    "\n",
-    "1. [Install and load SystemML and other libraries](#load_systemml)\n",
-    "1. [Download and Access MNIST data](#access_data)\n",
-    "1. [Train a CNN classifier for MNIST handwritten digits](#train)\n",
-    "1. [Detect handwritten Digits](#predict)\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "<div style=\"text-align:center\" markdown=\"1\">\n",
-    "![Image of Image to 
Digit](https://www.wolfram.com/mathematica/new-in-10/enhanced-image-processing/HTMLImages.en/handwritten-digits-classification/smallthumb_10.gif)\n",
-    "Mapping images of numbers to numbers\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"load_systemml\"></a>\n",
-    "## Install and load SystemML and other libraries"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "#!pip install --user systemml>0.13.0\n",
-    "!pip install 
~/git/incubator-systemml/target/systemml-0.15.0-incubating-SNAPSHOT-python.tgz"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "!pip show systemml"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "# Create symbolic link in ~/data/libs to use installed site-packages 
SystemML jar as opposed to DSX platform SystemML jar\n",
-    "# !ln -s -f 
~/.local/lib/python2.7/site-packages/systemml/systemml-java/*.jar ~/data/libs/"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "from systemml import MLContext, dml\n",
-    "\n",
-    "ml = MLContext(sc)\n",
-    "\n",
-    "print \"Spark Version:\", sc.version\n",
-    "print \"SystemML Version:\", ml.version()\n",
-    "print \"SystemML Built-Time:\", ml.buildTime()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "import warnings\n",
-    "warnings.filterwarnings(\"ignore\")\n",
-    "from sklearn import datasets\n",
-    "from sklearn.cross_validation import train_test_split\n",
-    "from sklearn.metrics import classification_report\n",
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "#import matplotlib.image as mpimg\n",
-    "%matplotlib inline"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"access_data\"></a>\n",
-    "## Download and Access MNIST data\n",
-    "\n",
-    "Download the [MNIST data from the MLData 
repository](http://mldata.org/repository/data/viewslug/mnist-original/), and 
then split and save."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "mnist = datasets.fetch_mldata(\"MNIST Original\")\n",
-    "\n",
-    "print \"Mnist data features:\", mnist.data.shape\n",
-    "print \"Mnist data label:\", mnist.target.shape\n",
-    "\n",
-    "trainX, testX, trainY, testY = train_test_split(mnist.data, 
mnist.target.astype(\"int0\"), test_size = 0.142857)\n",
-    "\n",
-    "trainD = np.concatenate((trainY.reshape(trainY.size, 1), 
trainX),axis=1)\n",
-    "testD  = np.concatenate((testY.reshape (testY.size, 1),  
testX),axis=1)\n",
-    "\n",
-    "print \"Images for training:\", trainD.shape\n",
-    "print \"Images used for testing:\", testD.shape\n",
-    "pix = int(np.sqrt(trainD.shape[1]))\n",
-    "print \"Each image is:\", pix, \"by\", pix, \"pixels\"\n",
-    "\n",
-    "np.savetxt('data/mnist/mnist_train.csv', trainD, fmt='%u', 
delimiter=\",\")\n",
-    "np.savetxt('data/mnist/mnist_test.csv', testD, fmt='%u', delimiter=\",\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Alternatively get the data from here."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "%%sh\n",
-    "mkdir -p data/mnist/\n",
-    "cd data/mnist/\n",
-    "curl -O https://pjreddie.com/media/files/mnist_train.csv\n";,
-    "curl -O https://pjreddie.com/media/files/mnist_test.csv\n";,
-    "wc -l mnist*"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Read the data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "trainData = np.genfromtxt('data/mnist/mnist_train.csv', 
delimiter=\",\")\n",
-    "testData  = np.genfromtxt('data/mnist/mnist_test.csv', 
delimiter=\",\")\n",
-    "\n",
-    "print \"Training data: \", trainData.shape\n",
-    "print \"Test data: \", testData.shape"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "pd.set_option('display.max_columns', 200)\n",
-    "pd.DataFrame(testData[1:10,],dtype='uint')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"train\"></a>\n",
-    "## Develop LeNet CNN classifier on Training Data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div style=\"text-align:center\" markdown=\"1\">\n",
-    "![Image of Image to 
Digit](http://www.ommegaonline.org/admin/journalassistance/picturegallery/896.jpg)\n",
-    "MNIST digit recognition â LeNet architecture\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Download the SystemML LeNet Implementation"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "!wget -N -q 
'https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/staging/SystemML-NN/examples/mnist_lenet.dml'\n",
-    "#!cat mnist_lenet.dml"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "#import requests\n",
-    "#get = 
requests.get('https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/staging/SystemML-NN/examples/mnist_lenet.dml')\n",
-    "#mnist_lenet = get.text.replace(\"epochs = 10\", \"epochs = 1\")\n",
-    "#with open('mnist_lenet.dml','wb') as out:\n",
-    "#    out.write(mnist_lenet)\n",
-    "#print mnist_lenet    "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Download SystemML neural network library"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "!svn --force export 
https://github.com/apache/incubator-systemml/trunk/scripts/staging/SystemML-NN/nn";
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Train Model using SystemML LeNet CNN."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "(on a Mac Book, this takes approx. 5-6 mins for 1 epoch)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "script = \"\"\"\n",
-    "  source(\"mnist_lenet.dml\") as mnist_lenet\n",
-    "\n",
-    "  # Bind training data\n",
-    "  n = nrow(data)\n",
-    "\n",
-    "  # Extract images and labels\n",
-    "  images = data[,2:ncol(data)]\n",
-    "  labels = data[,1]\n",
-    "\n",
-    "  # Scale images to [-1,1], and one-hot encode the labels\n",
-    "  images = (images / 255.0) * 2 - 1\n",
-    "  labels = table(seq(1, n), labels+1, n, 10)\n",
-    "\n",
-    "  # Split into training (55,000 examples) and validation (5,000 
examples)\n",
-    "  X = images[5001:nrow(images),]\n",
-    "  X_val = images[1:5000,]\n",
-    "  y = labels[5001:nrow(images),]\n",
-    "  y_val = labels[1:5000,]\n",
-    "\n",
-    "  # Train the model using channel, height, and width to produce 
weights/biases.\n",
-    "  [W1, b1, W2, b2, W3, b3, W4, b4] = mnist_lenet::train(X, y, X_val, 
y_val, C, Hin, Win, epochs)\n",
-    "\"\"\"\n",
-    "rets = ('W1', 'b1','W2','b2','W3','b3','W4','b4')\n",
-    "\n",
-    "script = (dml(script).input(data=trainData, epochs=1, C=1, Hin=28, 
Win=28)\n",
-    "                     .output(*rets))   \n",
-    "\n",
-    "W1, b1, W2, b2, W3, b3, W4, b4 = (ml.execute(script).get(*rets))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "Use trained model and predict on test data, and evaluate the quality of 
the predictions for each digit."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "scriptPredict = \"\"\"\n",
-    "  source(\"mnist_lenet.dml\") as mnist_lenet\n",
-    "\n",
-    "  # Separate images from lables and scale images to [-1,1]\n",
-    "  X_test = data[,2:ncol(data)]\n",
-    "  X_test = (X_test / 255.0) * 2 - 1\n",
-    "\n",
-    "  # Predict\n",
-    "  probs = mnist_lenet::predict(X_test, C, Hin, Win, W1, b1, W2, b2, W3, 
b3, W4, b4)\n",
-    "  predictions = rowIndexMax(probs) - 1\n",
-    "\"\"\"\n",
-    "script = (dml(scriptPredict).input(data=testData, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4)\n",
-    "                            .output(\"predictions\"))\n",
-    "\n",
-    "predictions = ml.execute(script).get(\"predictions\").toNumPy()\n",
-    "\n",
-    "print classification_report(testData[:,0], predictions)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"predict\"></a>\n",
-    "## Detect handwritten Digits"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Define a function that randomly selects a test image, display the image, 
and scores it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "img_size = np.sqrt(testData.shape[1] - 1)\n",
-    "\n",
-    "def displayImage(i):\n",
-    "    image = (testData[i,1:]).reshape((img_size, 
img_size)).astype(\"uint8\")\n",
-    "    imgplot = plt.imshow(image, cmap='gray')   "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "def predictImage(i):\n",
-    "    image = testData[i,:].reshape(1,testData.shape[1])\n",
-    "    prog = dml(scriptPredict).input(data=image, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4) \\\n",
-    "                             .output(\"predictions\")\n",
-    "    result = ml.execute(prog)\n",
-    "    return (result.get(\"predictions\").toNumPy())[0]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "i = np.random.choice(np.arange(0, len(testData)), size = (1,))\n",
-    "\n",
-    "p = predictImage(i)\n",
-    "\n",
-    "print \"Image\", i, \"\\nPredicted digit:\", p, \"\\nActual digit: \", 
testData[i,0], \"\\nResult: \", (p == testData[i,0])\n",
-    "\n",
-    "displayImage(i)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "pd.set_option('display.max_columns', 28)\n",
-    "pd.DataFrame((testData[i,1:]).reshape(img_size, img_size),dtype='uint')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 2",
-   "language": "python",
-   "name": "python2"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 2
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.11"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/7fe372b9/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb 
b/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb
new file mode 100644
index 0000000..d7f00c9
--- /dev/null
+++ b/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb
@@ -0,0 +1,406 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Deep Learning Image Classification\n",
+    "\n",
+    "This notebook shows SystemML Deep Learning functionality to map images of 
single digit numbers to their corresponding numeric representations. See 
[Getting Started with Deep Learning and 
Python](http://www.pyimagesearch.com/2014/09/22/getting-started-deep-learning-python/)
 for an explanation of the used deep learning concepts and assumptions.\n",
+    "\n",
+    "The downloaded MNIST dataset contains labeled images of handwritten 
digits, where each example is a 28x28 pixel image of grayscale values in the 
range [0,255] stretched out as 784 pixels, and each label is one of 10 possible 
digits in [0,9]. We download 60,000 training examples, and 10,000 test 
examples, where the format is \"label, pixel_1, pixel_2, ..., pixel_n\". We 
train a SystemML LeNet model. The results of the learning algorithms have an 
accuracy of 98 percent.\n",
+    "\n",
+    "1. [Install and load SystemML and other libraries](#load_systemml)\n",
+    "1. [Download and Access MNIST data](#access_data)\n",
+    "1. [Train a CNN classifier for MNIST handwritten digits](#train)\n",
+    "1. [Detect handwritten Digits](#predict)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "<div style=\"text-align:center\" markdown=\"1\">\n",
+    "![Image of Image to 
Digit](https://www.wolfram.com/mathematica/new-in-10/enhanced-image-processing/HTMLImages.en/handwritten-digits-classification/smallthumb_10.gif)\n",
+    "Mapping images of numbers to numbers\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"load_systemml\"></a>\n",
+    "## Install and load SystemML and other libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "#!pip install --user systemml>=0.14.0\n",
+    "!pip install 
~/git/systemml_scala/incubator-systemml/target/systemml-1.0.0-incubating-SNAPSHOT-python.tgz"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "!pip show systemml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from systemml import MLContext, dml\n",
+    "\n",
+    "ml = MLContext(sc)\n",
+    "\n",
+    "print \"Spark Version:\", sc.version\n",
+    "print \"SystemML Version:\", ml.version()\n",
+    "print \"SystemML Built-Time:\", ml.buildTime()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")\n",
+    "from sklearn import datasets\n",
+    "from sklearn.cross_validation import train_test_split\n",
+    "from sklearn.metrics import classification_report\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "#import matplotlib.image as mpimg\n",
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Create data directory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "mkdir -p data/mnist/\n",
+    "cd data/mnist/"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"access_data\"></a>\n",
+    "## Download and Access MNIST data\n",
+    "\n",
+    "Download the [MNIST data from the MLData 
repository](http://mldata.org/repository/data/viewslug/mnist-original/), and 
then split and save."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "mnist = datasets.fetch_mldata(\"MNIST Original\")\n",
+    "\n",
+    "print \"Mnist data features:\", mnist.data.shape\n",
+    "print \"Mnist data label:\", mnist.target.shape\n",
+    "\n",
+    "trainX, testX, trainY, testY = train_test_split(mnist.data, 
mnist.target.astype(\"int0\"), test_size = 0.142857)\n",
+    "\n",
+    "trainD = np.concatenate((trainY.reshape(trainY.size, 1), 
trainX),axis=1)\n",
+    "testD  = np.concatenate((testY.reshape (testY.size, 1),  
testX),axis=1)\n",
+    "\n",
+    "print \"Images for training:\", trainD.shape\n",
+    "print \"Images used for testing:\", testD.shape\n",
+    "pix = int(np.sqrt(trainD.shape[1]))\n",
+    "print \"Each image is:\", pix, \"by\", pix, \"pixels\"\n",
+    "\n",
+    "np.savetxt('data/mnist/mnist_train.csv', trainD, fmt='%u', 
delimiter=\",\")\n",
+    "np.savetxt('data/mnist/mnist_test.csv', testD, fmt='%u', delimiter=\",\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively get the data from here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "curl -O https://pjreddie.com/media/files/mnist_train.csv\n";,
+    "curl -O https://pjreddie.com/media/files/mnist_test.csv\n";,
+    "wc -l mnist*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Read the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "trainData = np.genfromtxt('data/mnist/mnist_train.csv', 
delimiter=\",\")\n",
+    "testData  = np.genfromtxt('data/mnist/mnist_test.csv', 
delimiter=\",\")\n",
+    "\n",
+    "print \"Training data: \", trainData.shape\n",
+    "print \"Test data: \", testData.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pd.set_option('display.max_columns', 200)\n",
+    "pd.DataFrame(testData[1:10,],dtype='uint')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"train\"></a>\n",
+    "## Develop LeNet CNN classifier on Training Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style=\"text-align:center\" markdown=\"1\">\n",
+    "![Image of Image to 
Digit](http://www.ommegaonline.org/admin/journalassistance/picturegallery/896.jpg)\n",
+    "MNIST digit recognition â LeNet architecture\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Train Model using SystemML LeNet CNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "(on a Mac Book, this takes approx. 5-6 mins for 1 epoch)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "  source(\"nn/examples/mnist_lenet.dml\") as mnist_lenet\n",
+    "\n",
+    "  # Bind training data\n",
+    "  n = nrow(data)\n",
+    "\n",
+    "  # Extract images and labels\n",
+    "  images = data[,2:ncol(data)]\n",
+    "  labels = data[,1]\n",
+    "\n",
+    "  # Scale images to [-1,1], and one-hot encode the labels\n",
+    "  images = (images / 255.0) * 2 - 1\n",
+    "  labels = table(seq(1, n), labels+1, n, 10)\n",
+    "\n",
+    "  # Split into training (55,000 examples) and validation (5,000 
examples)\n",
+    "  X = images[5001:nrow(images),]\n",
+    "  X_val = images[1:5000,]\n",
+    "  y = labels[5001:nrow(images),]\n",
+    "  y_val = labels[1:5000,]\n",
+    "\n",
+    "  # Train the model using channel, height, and width to produce 
weights/biases.\n",
+    "  [W1, b1, W2, b2, W3, b3, W4, b4] = mnist_lenet::train(X, y, X_val, 
y_val, C, Hin, Win, epochs)\n",
+    "\"\"\"\n",
+    "rets = ('W1', 'b1','W2','b2','W3','b3','W4','b4')\n",
+    "\n",
+    "script = (dml(script).input(data=trainData, epochs=1, C=1, Hin=28, 
Win=28)\n",
+    "                     .output(*rets))   \n",
+    "\n",
+    "W1, b1, W2, b2, W3, b3, W4, b4 = (ml.execute(script).get(*rets))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "Use trained model and predict on test data, and evaluate the quality of 
the predictions for each digit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "scriptPredict = \"\"\"\n",
+    "  source(\"nn/examples/mnist_lenet.dml\") as mnist_lenet\n",
+    "\n",
+    "  # Separate images from lables and scale images to [-1,1]\n",
+    "  X_test = data[,2:ncol(data)]\n",
+    "  X_test = (X_test / 255.0) * 2 - 1\n",
+    "\n",
+    "  # Predict\n",
+    "  probs = mnist_lenet::predict(X_test, C, Hin, Win, W1, b1, W2, b2, W3, 
b3, W4, b4)\n",
+    "  predictions = rowIndexMax(probs) - 1\n",
+    "\"\"\"\n",
+    "script = (dml(scriptPredict).input(data=testData, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4)\n",
+    "                            .output(\"predictions\"))\n",
+    "\n",
+    "predictions = ml.execute(script).get(\"predictions\").toNumPy()\n",
+    "\n",
+    "print classification_report(testData[:,0], predictions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"predict\"></a>\n",
+    "## Detect handwritten Digits"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define a function that randomly selects a test image, display the image, 
and scores it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "img_size = np.sqrt(testData.shape[1] - 1)\n",
+    "\n",
+    "def displayImage(i):\n",
+    "    image = (testData[i,1:]).reshape((img_size, 
img_size)).astype(\"uint8\")\n",
+    "    imgplot = plt.imshow(image, cmap='gray')   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def predictImage(i):\n",
+    "    image = testData[i,:].reshape(1,testData.shape[1])\n",
+    "    prog = dml(scriptPredict).input(data=image, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4) \\\n",
+    "                             .output(\"predictions\")\n",
+    "    result = ml.execute(prog)\n",
+    "    return (result.get(\"predictions\").toNumPy())[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "i = np.random.choice(np.arange(0, len(testData)), size = (1,))\n",
+    "\n",
+    "p = predictImage(i)\n",
+    "\n",
+    "print \"Image\", i, \"\\nPredicted digit:\", p, \"\\nActual digit: \", 
testData[i,0], \"\\nResult: \", (p == testData[i,0])\n",
+    "\n",
+    "displayImage(i)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pd.set_option('display.max_columns', 28)\n",
+    "pd.DataFrame((testData[i,1:]).reshape(img_size, img_size),dtype='uint')"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}

incubator-systemml git commit: [SYSTEMML-1606] Update notebook samples with latest code

Reply via email to