systemml git commit: [Minor]: fix notebooks

reinwald Thu, 07 Dec 2017 19:46:31 -0800

Repository: systemml
Updated Branches:
  refs/heads/master 0ef6b9246 -> bafed4986



[Minor]: fix notebooks

Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/bafed498
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/bafed498
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/bafed498

Branch: refs/heads/master
Commit: bafed4986fdbfc7977fa66d359c726c84ad65976
Parents: 0ef6b92
Author: Berthold Reinwald <[email protected]>
Authored: Thu Dec 7 19:43:50 2017 -0800
Committer: Berthold Reinwald <[email protected]>
Committed: Thu Dec 7 19:43:50 2017 -0800

----------------------------------------------------------------------
 .../Deep Learning Image Classification.ipynb    | 408 +++++++++++++
 .../Linear Regression Algorithms Demo.ipynb     | 595 +++++++++++++++++++
 2 files changed, 1003 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/bafed498/samples/jupyter-notebooks/Deep
 Learning Image Classification.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb 
b/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb
new file mode 100644
index 0000000..61617fc
--- /dev/null
+++ b/samples/jupyter-notebooks/Deep Learning Image Classification.ipynb        
@@ -0,0 +1,408 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Deep Learning Image Classification\n",
+    "\n",
+    "This notebook shows SystemML Deep Learning functionality to map images of 
single digit numbers to their corresponding numeric representations. See 
[Getting Started with Deep Learning and 
Python](http://www.pyimagesearch.com/2014/09/22/getting-started-deep-learning-python/)
 for an explanation of the used deep learning concepts and assumptions.\n",
+    "\n",
+    "The downloaded MNIST dataset contains labeled images of handwritten 
digits, where each example is a 28x28 pixel image of grayscale values in the 
range [0,255] stretched out as 784 pixels, and each label is one of 10 possible 
digits in [0,9]. We download 60,000 training examples, and 10,000 test 
examples, where the format is \"label, pixel_1, pixel_2, ..., pixel_n\". We 
train a SystemML LeNet model. The results of the learning algorithms have an 
accuracy of 98 percent.\n",
+    "\n",
+    "1. [Install and load SystemML and other libraries](#load_systemml)\n",
+    "1. [Download and Access MNIST data](#access_data)\n",
+    "1. [Train a CNN classifier for MNIST handwritten digits](#train)\n",
+    "1. [Detect handwritten Digits](#predict)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "<div style=\"text-align:center\" markdown=\"1\">\n",
+    "![Image of Image to 
Digit](https://www.wolfram.com/mathematica/new-in-10/enhanced-image-processing/HTMLImages.en/handwritten-digits-classification/smallthumb_10.gif)\n",
+    "Mapping images of numbers to numbers\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"load_systemml\"></a>\n",
+    "## Install and load SystemML and other libraries"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {
+    "scrolled": true
+   },
+   "source": [
+    "!pip uninstall systemml --y\n",
+    "!pip install --user 
https://repository.apache.org/content/groups/snapshots/org/apache/systemml/systemml/1.0.0-SNAPSHOT/systemml-1.0.0-20171201.070207-23-python.tar.gz";
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from systemml import MLContext, dml\n",
+    "\n",
+    "ml = MLContext(sc)\n",
+    "\n",
+    "print \"Spark Version:\", sc.version\n",
+    "print \"SystemML Version:\", ml.version()\n",
+    "print \"SystemML Built-Time:\", ml.buildTime()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from sklearn import datasets\n",
+    "from sklearn.cross_validation import train_test_split\n",
+    "from sklearn.metrics import classification_report\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "%matplotlib inline\n",
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"access_data\"></a>\n",
+    "## Download and Access MNIST data\n",
+    "\n",
+    "Download the [MNIST data from the MLData 
repository](http://mldata.org/repository/data/viewslug/mnist-original/), and 
then split and save."
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {
+    "scrolled": false
+   },
+   "source": [
+    "mnist = datasets.fetch_mldata(\"MNIST Original\")\n",
+    "\n",
+    "print \"Mnist data features:\", mnist.data.shape\n",
+    "print \"Mnist data label:\", mnist.target.shape\n",
+    "\n",
+    "trainX, testX, trainY, testY = train_test_split(mnist.data, 
mnist.target.astype(\"int0\"), test_size = 0.142857)\n",
+    "\n",
+    "trainD = np.concatenate((trainY.reshape(trainY.size, 1), 
trainX),axis=1)\n",
+    "testD  = np.concatenate((testY.reshape (testY.size, 1),  
testX),axis=1)\n",
+    "\n",
+    "print \"Images for training:\", trainD.shape\n",
+    "print \"Images used for testing:\", testD.shape\n",
+    "pix = int(np.sqrt(trainD.shape[1]))\n",
+    "print \"Each image is:\", pix, \"by\", pix, \"pixels\"\n",
+    "\n",
+    "np.savetxt('mnist/mnist_train.csv', trainD, fmt='%u', delimiter=\",\")\n",
+    "np.savetxt('mnist/mnist_test.csv', testD, fmt='%u', delimiter=\",\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Read the data."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "trainData = np.genfromtxt('mnist/mnist_train.csv', delimiter=\",\")\n",
+    "testData  = np.genfromtxt('mnist/mnist_test.csv', delimiter=\",\")\n",
+    "\n",
+    "print \"Training data: \", trainData.shape\n",
+    "print \"Test data: \", testData.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "pd.set_option('display.max_columns', 200)\n",
+    "pd.DataFrame(testData[1:10,],dtype='uint')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"train\"></a>\n",
+    "## Develop LeNet CNN classifier on Training Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<div style=\"text-align:center\" markdown=\"1\">\n",
+    "![Image of Image to 
Digit](http://www.ommegaonline.org/admin/journalassistance/picturegallery/896.jpg)\n",
+    "MNIST digit recognition â LeNet architecture\n",
+    "</div>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### (Optional) Display SystemML LeNet Implementation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "!jar -xf 
~/.local/lib/python2.7/site-packages/systemml/systemml-java/systemml*.jar 
scripts/nn/examples/mnist_lenet.dml\n",
+    "!cat scripts/nn/examples/mnist_lenet.dml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Train Model using SystemML LeNet CNN."
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "ml.setGPU(True).setForceGPU(True)"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "ml.setStatistics(False)"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "ml.setExplain(True).setExplainLevel('runtime')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "  source(\"scripts/nn/examples/mnist_lenet.dml\") as mnist_lenet\n",
+    "  # Bind data; Extract images and labels\n",
+    "  n = nrow(data)\n",
+    "  images = data[,2:ncol(data)]\n",
+    "  labels = data[,1]\n",
+    "\n",
+    "  # Scale images to [-1,1], and one-hot encode the labels\n",
+    "  images = (images / 255.0) * 2 - 1\n",
+    "  labels = table(seq(1, n), labels+1, n, 10)\n",
+    "\n",
+    "  # Split data into training (55,000 examples) and validation (5,000 
examples)\n",
+    "  X = images[5001:nrow(images),]\n",
+    "  X_val = images[1:5000,]\n",
+    "  y = labels[5001:nrow(images),]\n",
+    "  y_val = labels[1:5000,]\n",
+    "\n",
+    "  # Train the model using channel, height, and width to produce 
weights/biases.\n",
+    "  [W1, b1, W2, b2, W3, b3, W4, b4] = mnist_lenet::train(X, y, X_val, 
y_val, C, Hin, Win, epochs)\n",
+    "\"\"\"\n",
+    "rets = ('W1', 'b1','W2','b2','W3','b3','W4','b4')\n",
+    "\n",
+    "script = (dml(script).input(data=trainData, epochs=1, C=1, Hin=28, 
Win=28)\n",
+    "                     .output(*rets))   \n",
+    "\n",
+    "W1, b1, W2, b2, W3, b3, W4, b4 = ml.execute(script).get(*rets)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "Use trained model and predict on test data, and evaluate the quality of 
the predictions for each digit."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "scriptPredict = \"\"\"\n",
+    "  source(\"scripts/nn/examples/mnist_lenet.dml\") as mnist_lenet\n",
+    "\n",
+    "  # Separate images from lables and scale images to [-1,1]\n",
+    "  X_test = data[,2:ncol(data)]\n",
+    "  X_test = (X_test / 255.0) * 2 - 1\n",
+    "\n",
+    "  # Predict\n",
+    "  probs = mnist_lenet::predict(X_test, C, Hin, Win, W1, b1, W2, b2, W3, 
b3, W4, b4)\n",
+    "  predictions = rowIndexMax(probs) - 1\n",
+    "\"\"\"\n",
+    "script = (dml(scriptPredict).input(data=testData, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4)\n",
+    "                            .output(\"predictions\"))\n",
+    "\n",
+    "predictions = ml.execute(script).get(\"predictions\").toNumPy()\n",
+    "\n",
+    "print classification_report(testData[:,0], predictions)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<a id=\"predict\"></a>\n",
+    "## Detect handwritten Digits"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Define a function that randomly selects a test image, display the image, 
and scores it."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "img_size = np.sqrt(testData.shape[1] - 1).astype(\"uint8\")\n",
+    "\n",
+    "def displayImage(i):\n",
+    "    image = testData[i,1:].reshape((img_size, 
img_size)).astype(\"uint8\")\n",
+    "    imgplot = plt.imshow(image, cmap='gray')   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def predictImage(i):\n",
+    "    image = testData[i,:].reshape(1,testData.shape[1])\n",
+    "    prog = dml(scriptPredict).input(data=image, C=1, Hin=28, Win=28, 
W1=W1, b1=b1, W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4) \\\n",
+    "                             .output(\"predictions\")\n",
+    "    result = ml.execute(prog)\n",
+    "    return (result.get(\"predictions\").toNumPy())[0]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false,
+    "scrolled": false
+   },
+   "outputs": [],
+   "source": [
+    "i = np.random.choice(np.arange(0, len(testData)), size = (1,))\n",
+    "\n",
+    "p = predictImage(i)\n",
+    "\n",
+    "print \"Image\", i, \"\\nPredicted digit:\", p, \"\\nActual digit: \", 
testData[i,0], \"\\nResult: \", (p == testData[i,0])\n",
+    "\n",
+    "displayImage(i)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "pd.set_option('display.max_columns', 28)\n",
+    "pd.DataFrame((testData[i,1:]).reshape(img_size, img_size),dtype='uint')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}

http://git-wip-us.apache.org/repos/asf/systemml/blob/bafed498/samples/jupyter-notebooks/Linear
 Regression Algorithms Demo.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Linear Regression Algorithms Demo.ipynb 
b/samples/jupyter-notebooks/Linear Regression Algorithms Demo.ipynb
new file mode 100644
index 0000000..001f402
--- /dev/null
+++ b/samples/jupyter-notebooks/Linear Regression Algorithms Demo.ipynb 
@@ -0,0 +1,595 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Linear Regression Algorithms using Apache SystemML\n",
+    "\n",
+    "This notebook shows:\n",
+    "- Install SystemML Python package and jar file\n",
+    "  - pip\n",
+    "  - SystemML 'Hello World'\n",
+    "- Example 1: Matrix Multiplication\n",
+    "  - SystemML script to generate a random matrix, perform matrix 
multiplication, and compute the sum of the output\n",
+    "  - Examine execution plans, and increase data size to obverve changed 
execution plans\n",
+    "- Load diabetes dataset from scikit-learn\n",
+    "- Example 2: Implement three different algorithms to train linear 
regression model\n",
+    "  - Algorithm 1: Linear Regression - Direct Solve (no regularization)\n",
+    "  - Algorithm 2: Linear Regression - Batch Gradient Descent (no 
regularization)\n",
+    "  - Algorithm 3: Linear Regression - Conjugate Gradient (no 
regularization)\n",
+    "- Example 3: Invoke existing SystemML algorithm script LinearRegDS.dml 
using MLContext API\n",
+    "- Example 4: Invoke existing SystemML algorithm using 
scikit-learn/SparkML pipeline like API\n",
+    "- Uninstall/Clean up SystemML Python package and jar file"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Install SystemML Python package and jar file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "!pip uninstall systemml --y\n",
+    "!pip install --user 
https://repository.apache.org/content/groups/snapshots/org/apache/systemml/systemml/1.0.0-SNAPSHOT/systemml-1.0.0-20171201.070207-23-python.tar.gz";
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "!pip show systemml"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import SystemML API "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "from systemml import MLContext, dml, dmlFromResource\n",
+    "\n",
+    "ml = MLContext(sc)\n",
+    "\n",
+    "print \"Spark Version:\", sc.version\n",
+    "print \"SystemML Version:\", ml.version()\n",
+    "print \"SystemML Built-Time:\", ml.buildTime()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "ml.execute(dml(\"\"\"s = 'Hello World!'\"\"\").output(\"s\")).get(\"s\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Import numpy, sklearn, and define some helper functions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "import numpy as np\n",
+    "from sklearn import datasets\n",
+    "plt.switch_backend('agg')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example 1: Matrix Multiplication"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### SystemML script to generate a random matrix, perform matrix 
multiplication, and compute the sum of the output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true,
+    "slideshow": {
+     "slide_type": "-"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "    X = rand(rows=$nr, cols=1000, sparsity=0.5)\n",
+    "    A = t(X) %*% X\n",
+    "    s = sum(A)\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "ml.setStatistics(False)"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "ml.setExplain(True).setExplainLevel(\"runtime\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "prog = dml(script).input('$nr', 1e5).output('s')\n",
+    "s = ml.execute(prog).get('s')\n",
+    "print (s)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Load diabetes dataset from scikit-learn "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "%matplotlib inline"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "diabetes = datasets.load_diabetes()\n",
+    "diabetes_X = diabetes.data[:, np.newaxis, 2]\n",
+    "diabetes_X_train = diabetes_X[:-20]\n",
+    "diabetes_X_test = diabetes_X[-20:]\n",
+    "diabetes_y_train = diabetes.target[:-20].reshape(-1,1)\n",
+    "diabetes_y_test = diabetes.target[-20:].reshape(-1,1)\n",
+    "\n",
+    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
+    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "diabetes.data.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example 2: Implement three different algorithms to train linear 
regression model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "## Algorithm 1: Linear Regression - Direct Solve (no regularization) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Least squares formulation\n",
+    "w* = argminw ||Xw-y||2 = argminw (y - Xw)'(y - Xw) = argminw (w'(X'X)w - 
w'(X'y))/2\n",
+    "\n",
+    "#### Setting the gradient\n",
+    "dw = (X'X)w - (X'y) to 0, w = (X'X)-1(X' y) = solve(X'X, X'y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "    # add constant feature to X to model intercept\n",
+    "    X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n",
+    "    A = t(X) %*% X\n",
+    "    b = t(X) %*% y\n",
+    "    w = solve(A, b)\n",
+    "    bias = as.scalar(w[nrow(w),1])\n",
+    "    w = w[1:nrow(w)-1,]\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "prog = dml(script).input(X=diabetes_X_train, 
y=diabetes_y_train).output('w', 'bias')\n",
+    "w, bias = ml.execute(prog).get('w','bias')\n",
+    "w = w.toNumPy()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
+    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
+    "\n",
+    "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='blue', 
linestyle ='dotted')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "collapsed": true
+   },
+   "source": [
+    "## Algorithm 2: Linear Regression - Batch Gradient Descent (no 
regularization)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Algorithm\n",
+    "`Step 1: Start with an initial point \n",
+    "while(not converged) { \n",
+    "  Step 2: Compute gradient dw. \n",
+    "  Step 3: Compute stepsize alpha.     \n",
+    "  Step 4: Update: wnew = wold + alpha*dw \n",
+    "}`\n",
+    "\n",
+    "#### Gradient formula\n",
+    "`dw = r = (X'X)w - (X'y)`\n",
+    "\n",
+    "#### Step size formula\n",
+    "`Find number alpha to minimize f(w + alpha*r) \n",
+    "alpha = -(r'r)/(r'X'Xr)`\n",
+    "\n",
+    "![Gradient 
Descent](http://blog.datumbox.com/wp-content/uploads/2013/10/gradient-descent.png)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "    # add constant feature to X to model intercepts\n",
+    "    X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n",
+    "    max_iter = 100\n",
+    "    w = matrix(0, rows=ncol(X), cols=1)\n",
+    "    for(i in 1:max_iter){\n",
+    "        XtX = t(X) %*% X\n",
+    "        dw = XtX %*%w - t(X) %*% y\n",
+    "        alpha = -(t(dw) %*% dw) / (t(dw) %*% XtX %*% dw)\n",
+    "        w = w + dw*alpha\n",
+    "    }\n",
+    "    bias = as.scalar(w[nrow(w),1])\n",
+    "    w = w[1:nrow(w)-1,]    \n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "prog = dml(script).input(X=diabetes_X_train, 
y=diabetes_y_train).output('w', 'bias')\n",
+    "w, bias = ml.execute(prog).get('w', 'bias')\n",
+    "w = w.toNumPy()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
+    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
+    "\n",
+    "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='red', 
linestyle ='dashed')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Algorithm 3: Linear Regression - Conjugate Gradient (no regularization)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Problem with gradient descent: Takes very similar directions many 
times\n",
+    "\n",
+    "Solution: Enforce conjugacy\n",
+    "\n",
+    "`Step 1: Start with an initial point \n",
+    "while(not converged) {\n",
+    "   Step 2: Compute gradient dw.\n",
+    "   Step 3: Compute stepsize alpha.\n",
+    "   Step 4: Compute next direction p by enforcing conjugacy with previous 
direction.\n",
+    "   Step 4: Update: w_new = w_old + alpha*p\n",
+    "}`\n",
+    "\n",
+    "![Gradient Descent vs Conjugate 
Gradient](http://i.stack.imgur.com/zh1HH.png)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "script = \"\"\"\n",
+    "    # add constant feature to X to model intercepts\n",
+    "    X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n",
+    "    m = ncol(X); i = 1; \n",
+    "    max_iter = 20;\n",
+    "    w = matrix (0, rows = m, cols = 1); # initialize weights to 0\n",
+    "    dw = - t(X) %*% y; p = - dw;        # dw = (X'X)w - (X'y)\n",
+    "    norm_r2 = sum (dw ^ 2); \n",
+    "    for(i in 1:max_iter) {\n",
+    "        q = t(X) %*% (X %*% p)\n",
+    "        alpha = norm_r2 / sum (p * q);  # Minimizes f(w - alpha*r)\n",
+    "        w = w + alpha * p;              # update weights\n",
+    "        dw = dw + alpha * q;           \n",
+    "        old_norm_r2 = norm_r2; norm_r2 = sum (dw ^ 2);\n",
+    "        p = -dw + (norm_r2 / old_norm_r2) * p; # next direction - 
conjugacy to previous direction\n",
+    "        i = i + 1;\n",
+    "    }\n",
+    "    bias = as.scalar(w[nrow(w),1])\n",
+    "    w = w[1:nrow(w)-1,]    \n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "prog = dml(script).input(X=diabetes_X_train, 
y=diabetes_y_train).output('w', 'bias')\n",
+    "w, bias = ml.execute(prog).get('w','bias')\n",
+    "w = w.toNumPy()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
+    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
+    "\n",
+    "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='red', 
linestyle ='dashed')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example 3: Invoke existing SystemML algorithm script LinearRegDS.dml 
using MLContext API"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "prog = 
dmlFromResource('scripts/algorithms/LinearRegDS.dml').input(X=diabetes_X_train, 
y=diabetes_y_train).input('$icpt',1.0).output('beta_out')\n",
+    "w = ml.execute(prog).get('beta_out')\n",
+    "w = w.toNumPy()\n",
+    "bias=w[1]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
+    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
+    "\n",
+    "plt.plot(diabetes_X_test, (w[0]*diabetes_X_test)+bias, color='red', 
linestyle ='dashed')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Example 4: Invoke existing SystemML algorithm using 
scikit-learn/SparkML pipeline like API"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*mllearn* API allows a Python programmer to invoke SystemML's algorithms 
using scikit-learn like API as well as Spark's MLPipeline API."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "from pyspark.sql import SQLContext\n",
+    "from systemml.mllearn import LinearRegression\n",
+    "sqlCtx = SQLContext(sc)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "regr = LinearRegression(sqlCtx)\n",
+    "# Train the model using the training sets\n",
+    "regr.fit(diabetes_X_train, diabetes_y_train)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "predictions = regr.predict(diabetes_X_test)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "# Use the trained model to perform prediction\n",
+    "%matplotlib inline\n",
+    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
+    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
+    "\n",
+    "plt.plot(diabetes_X_test, predictions, color='black')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Uninstall/Clean up SystemML Python package and jar file"
+   ]
+  },
+  {
+   "cell_type": "raw",
+   "metadata": {},
+   "source": [
+    "!pip uninstall systemml --y"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}

systemml git commit: [Minor]: fix notebooks

Reply via email to