Repository: incubator-systemml
Updated Branches:
  refs/heads/master 651725651 -> cfc73fefe


New Jupyter Python Notebook to demonstrate SystemML Lenet on MNIST.

Notebook useful for meet up demos to show off pip installation, downlaod
MNIST, training Lenet CNN, test, and scoring.

Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/cfc73fef
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/cfc73fef
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/cfc73fef

Branch: refs/heads/master
Commit: cfc73fefee417226f37533a10ca07a00666ab581
Parents: 6517256
Author: Berthold Reinwald <[email protected]>
Authored: Mon Apr 10 16:56:18 2017 -0700
Committer: Berthold Reinwald <[email protected]>
Committed: Mon Apr 10 17:53:22 2017 -0700

----------------------------------------------------------------------
 .../Deep Learning Image Classification.ipynb    | 489 +++++++++++++++++++
 1 file changed, 489 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/cfc73fef/samples/jupyter-notebooks/Deep
 Learning Image Classification.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb 
b/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb
new file mode 100644
index 0000000..eac20fd
--- /dev/null
+++ b/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb        
@@ -0,0 +1,489 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Deep Learning Image Classification\n",
+    "\n",
+    "This notebook shows SystemML Deep Learning functionality to map images of 
single digit numbers to their corresponding numeric representations. See 
[Getting Started with Deep Learning and 
Python](http://www.pyimagesearch.com/2014/09/22/getting-started-deep-learning-python/)
 for an explanation of the used deep learning concepts and assumptions.\n",
+    "\n",
+    "The downloaded MNIST dataset contains labeled images of handwritten 
digits, where each example is a 28x28 pixel image of grayscale values in the 
range [0,255] stretched out as 784 pixels, and each label is one of 10 possible 
digits in [0,9]. We download 60,000 training examples, and 10,000 test 
examples, where the format is \"label, pixel_1, pixel_2, ..., pixel_n\". We 
train a SystemML LeNet model. The results of the learning algorithms have an 
accuracy of 98 percent.\n",
+    "\n",
+    "1. [Install and load SystemML and other libraries](#load_systemml)\n",
+    "1. [Download and Access MNIST data](#access_data)\n",
+    "1. [Train a CNN classifier for MNIST handwritten digits](#train)\n",
+    "1. [Detect handwritten Digits](#predict)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "<div style=\"text-align:center\" markdown=\"1\">\n",
+    "![Image of Image to 
Digit](https://www.wolfram.com/mathematica/new-in-10/enhanced-image-processing/HTMLImages.en/handwritten-digits-classification/smallthumb_10.gif)\n",
+    "Mapping images of numbers to numbers\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"load_systemml\"></a>\n",
+    "## Install and load SystemML and other libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "#!pip install --user systemml>0.13.0\n",
+    "!pip install 
~/git/incubator-systemml/target/systemml-0.15.0-incubating-SNAPSHOT-python.tgz"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "!pip show systemml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "# Create symbolic link in ~/data/libs to use installed site-packages 
SystemML jar as opposed to DSX platform SystemML jar\n",
+    "# !ln -s -f 
~/.local/lib/python2.7/site-packages/systemml/systemml-java/*.jar ~/data/libs/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from systemml import MLContext, dml\n",
+    "\n",
+    "ml = MLContext(sc)\n",
+    "\n",
+    "print \"Spark Version:\", sc.version\n",
+    "print \"SystemML Version:\", ml.version()\n",
+    "print \"SystemML Built-Time:\", ml.buildTime()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")\n",
+    "from sklearn import datasets\n",
+    "from sklearn.cross_validation import train_test_split\n",
+    "from sklearn.metrics import classification_report\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "#import matplotlib.image as mpimg\n",
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"access_data\"></a>\n",
+    "## Download and Access MNIST data\n",
+    "\n",
+    "Download the [MNIST data from the MLData 
repository](http://mldata.org/repository/data/viewslug/mnist-original/), and 
then split and save."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "mnist = datasets.fetch_mldata(\"MNIST Original\")\n",
+    "\n",
+    "print \"Mnist data features:\", mnist.data.shape\n",
+    "print \"Mnist data label:\", mnist.target.shape\n",
+    "\n",
+    "trainX, testX, trainY, testY = train_test_split(mnist.data, 
mnist.target.astype(\"int0\"), test_size = 0.142857)\n",
+    "\n",
+    "trainD = np.concatenate((trainY.reshape(trainY.size, 1), 
trainX),axis=1)\n",
+    "testD  = np.concatenate((testY.reshape (testY.size, 1),  
testX),axis=1)\n",
+    "\n",
+    "print \"Images for training:\", trainD.shape\n",
+    "print \"Images used for testing:\", testD.shape\n",
+    "pix = int(np.sqrt(trainD.shape[1]))\n",
+    "print \"Each image is:\", pix, \"by\", pix, \"pixels\"\n",
+    "\n",
+    "np.savetxt('data/mnist/mnist_train.csv', trainD, fmt='%u', 
delimiter=\",\")\n",
+    "np.savetxt('data/mnist/mnist_test.csv', testD, fmt='%u', delimiter=\",\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Alternatively get the data from here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "%%sh\n",
+    "mkdir -p data/mnist/\n",
+    "cd data/mnist/\n",
+    "curl -O https://pjreddie.com/media/files/mnist_train.csv\n";,
+    "curl -O https://pjreddie.com/media/files/mnist_test.csv\n";,
+    "wc -l mnist*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Read the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "trainData = np.genfromtxt('data/mnist/mnist_train.csv', 
delimiter=\",\")\n",
+    "testData  = np.genfromtxt('data/mnist/mnist_test.csv', 
delimiter=\",\")\n",
+    "\n",
+    "print \"Training data: \", trainData.shape\n",
+    "print \"Test data: \", testData.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "pd.set_option('display.max_columns', 200)\n",
+    "pd.DataFrame(testData[1:10,],dtype='uint')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"train\"></a>\n",
+    "## Develop LeNet CNN classifier on Training Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style=\"text-align:center\" markdown=\"1\">\n",
+    "![Image of Image to 
Digit](http://www.ommegaonline.org/admin/journalassistance/picturegallery/896.jpg)\n",
+    "MNIST digit recognition – LeNet architecture\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Download the SystemML LeNet Implementation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "!wget -N -q 
'https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/staging/SystemML-NN/examples/mnist_lenet.dml'\n",
+    "#!cat mnist_lenet.dml"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "#import requests\n",
+    "#get = 
requests.get('https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/staging/SystemML-NN/examples/mnist_lenet.dml')\n",
+    "#mnist_lenet = get.text.replace(\"epochs = 10\", \"epochs = 1\")\n",
+    "#with open('mnist_lenet.dml','wb') as out:\n",
+    "#    out.write(mnist_lenet)\n",
+    "#print mnist_lenet    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Download SystemML neural network library"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "!svn --force export 
https://github.com/apache/incubator-systemml/trunk/scripts/staging/SystemML-NN/nn";
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Train Model using SystemML LeNet CNN."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "(on a Mac Book, this takes approx. 5-6 mins for 1 epoch)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "  source(\"mnist_lenet.dml\") as mnist_lenet\n",
+    "\n",
+    "  # Bind training data\n",
+    "  n = nrow(data)\n",
+    "\n",
+    "  # Extract images and labels\n",
+    "  images = data[,2:ncol(data)]\n",
+    "  labels = data[,1]\n",
+    "\n",
+    "  # Scale images to [-1,1], and one-hot encode the labels\n",
+    "  images = (images / 255.0) * 2 - 1\n",
+    "  labels = table(seq(1, n), labels+1, n, 10)\n",
+    "\n",
+    "  # Split into training (55,000 examples) and validation (5,000 
examples)\n",
+    "  X = images[5001:nrow(images),]\n",
+    "  X_val = images[1:5000,]\n",
+    "  y = labels[5001:nrow(images),]\n",
+    "  y_val = labels[1:5000,]\n",
+    "\n",
+    "  # Train the model using channel, height, and width to produce 
weights/biases.\n",
+    "  [W1, b1, W2, b2, W3, b3, W4, b4] = mnist_lenet::train(X, y, X_val, 
y_val, C, Hin, Win, epochs)\n",
+    "\"\"\"\n",
+    "rets = ('W1', 'b1','W2','b2','W3','b3','W4','b4')\n",
+    "\n",
+    "script = (dml(script).input(data=trainData, epochs=1, C=1, Hin=28, 
Win=28)\n",
+    "                     .output(*rets))   \n",
+    "\n",
+    "W1, b1, W2, b2, W3, b3, W4, b4 = (ml.execute(script).get(*rets))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "Use trained model and predict on test data, and evaluate the quality of 
the predictions for each digit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "scriptPredict = \"\"\"\n",
+    "  source(\"mnist_lenet.dml\") as mnist_lenet\n",
+    "\n",
+    "  # Separate images from lables and scale images to [-1,1]\n",
+    "  X_test = data[,2:ncol(data)]\n",
+    "  X_test = (X_test / 255.0) * 2 - 1\n",
+    "\n",
+    "  # Predict\n",
+    "  probs = mnist_lenet::predict(X_test, C, Hin, Win, W1, b1, W2, b2, W3, 
b3, W4, b4)\n",
+    "  predictions = rowIndexMax(probs) - 1\n",
+    "\"\"\"\n",
+    "script = (dml(scriptPredict).input(data=testData, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4)\n",
+    "                            .output(\"predictions\"))\n",
+    "\n",
+    "predictions = ml.execute(script).get(\"predictions\").toNumPy()\n",
+    "\n",
+    "print classification_report(testData[:,0], predictions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"predict\"></a>\n",
+    "## Detect handwritten Digits"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define a function that randomly selects a test image, display the image, 
and scores it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "img_size = np.sqrt(testData.shape[1] - 1)\n",
+    "\n",
+    "def displayImage(i):\n",
+    "    image = (testData[i,1:]).reshape((img_size, 
img_size)).astype(\"uint8\")\n",
+    "    imgplot = plt.imshow(image, cmap='gray')   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "def predictImage(i):\n",
+    "    image = testData[i,:].reshape(1,testData.shape[1])\n",
+    "    prog = dml(scriptPredict).input(data=image, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4) \\\n",
+    "                             .output(\"predictions\")\n",
+    "    result = ml.execute(prog)\n",
+    "    return (result.get(\"predictions\").toNumPy())[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "i = np.random.choice(np.arange(0, len(testData)), size = (1,))\n",
+    "\n",
+    "p = predictImage(i)\n",
+    "\n",
+    "print \"Image\", i, \"\\nPredicted digit:\", p, \"\\nActual digit: \", 
testData[i,0], \"\\nResult: \", (p == testData[i,0])\n",
+    "\n",
+    "displayImage(i)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "pd.set_option('display.max_columns', 28)\n",
+    "pd.DataFrame((testData[i,1:]).reshape(img_size, img_size),dtype='uint')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

Reply via email to