systemml git commit: [Minor]: minor additions to notebooks.

reinwald Thu, 07 Dec 2017 16:34:58 -0800

Repository: systemml
Updated Branches:
  refs/heads/master fdc24bb7d -> 0ef6b9246



[Minor]: minor additions to notebooks.

Project: http://git-wip-us.apache.org/repos/asf/systemml/repo
Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/0ef6b924
Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/0ef6b924
Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/0ef6b924

Branch: refs/heads/master
Commit: 0ef6b924612951ccd003e8466fc9a911b098297f
Parents: fdc24bb
Author: Berthold Reinwald <reinw...@us.ibm.com>
Authored: Thu Dec 7 16:16:07 2017 -0800
Committer: Berthold Reinwald <reinw...@us.ibm.com>
Committed: Thu Dec 7 16:29:32 2017 -0800

----------------------------------------------------------------------
 .../Deep_Learning_Image_Classification.ipynb    | 316 ----------
 .../Linear_Regression_Algorithms_Demo.ipynb     | 599 -------------------
 2 files changed, 915 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/systemml/blob/0ef6b924/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb 
b/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb
deleted file mode 100644
index 42f249f..0000000
--- a/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb
+++ /dev/null
@@ -1,316 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Deep Learning Image Classification using Apache SystemML\n",
-    "\n",
-    "This notebook demonstrates how to train a deep learning model on SystemML 
for the classic [MNIST](http://yann.lecun.com/exdb/mnist/) problem of mapping 
images of single digit numbers to their corresponding numeric representations, 
using a classic [LeNet](http://yann.lecun.com/exdb/lenet/)-like convolutional 
neural network model. See [Neural Networks and Deep 
Learning](http://neuralnetworksanddeeplearning.com/chap6.html) for more 
information on neural networks and deep learning.\n",
-    "\n",
-    "The downloaded MNIST dataset contains labeled images of handwritten 
digits, where each example is a 28x28 pixel image of grayscale values in the 
range [0,255] stretched out as 784 pixels, and each label is one of 10 possible 
digits in [0,9].  We download 60,000 training examples, and 10,000 test 
examples, where the images and labels are stored in separate matrices.  We then 
train a SystemML LeNet-like convolutional neural network (i.e. \"convnet\", 
\"CNN\") model.  The resulting trained model has an accuracy of 98.6% on the 
test dataset.\n",
-    "\n",
-    "1. [Download the MNIST data](#download_data)\n",
-    "1. [Train a CNN classifier for MNIST handwritten digits](#train)\n",
-    "1. [Detect handwritten Digits](#predict)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div style=\"text-align:center\" markdown=\"1\">\n",
-    "![Image of Image to 
Digit](https://www.wolfram.com/mathematica/new-in-10/enhanced-image-processing/HTMLImages.en/handwritten-digits-classification/smallthumb_10.gif)\n",
-    "Mapping images of numbers to numbers\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Note: This notebook is supported with SystemML 0.14.0 and above."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "!pip show systemml"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%matplotlib inline\n",
-    "\n",
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "from sklearn import datasets\n",
-    "from sklearn.cross_validation import train_test_split  # module 
deprecated in 0.18\n",
-    "#from sklearn.model_selection import train_test_split  # use this module 
for >=0.18\n",
-    "from sklearn import metrics\n",
-    "from systemml import MLContext, dml"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "ml = MLContext(sc)\n",
-    "print(\"Spark Version: {}\".format(sc.version))\n",
-    "print(\"SystemML Version: {}\".format(ml.version()))\n",
-    "print(\"SystemML Built-Time: {}\".format(ml.buildTime()))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"download_data\"></a>\n",
-    "## Download the MNIST data\n",
-    "\n",
-    "Download the [MNIST data from the MLData 
repository](http://mldata.org/repository/data/viewslug/mnist-original/), and 
then split and save."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "mnist = datasets.fetch_mldata(\"MNIST Original\")\n",
-    "\n",
-    "print(\"MNIST data features: {}\".format(mnist.data.shape))\n",
-    "print(\"MNIST data labels: {}\".format(mnist.target.shape))\n",
-    "\n",
-    "X_train, X_test, y_train, y_test = train_test_split(\n",
-    "    mnist.data, mnist.target.astype(np.uint8).reshape(-1, 1),\n",
-    "    test_size = 10000)\n",
-    "\n",
-    "print(\"Training images, labels: {}, {}\".format(X_train.shape, 
y_train.shape))\n",
-    "print(\"Testing images, labels: {}, {}\".format(X_test.shape, 
y_test.shape))\n",
-    "print(\"Each image is: {0:d}x{0:d} 
pixels\".format(int(np.sqrt(X_train.shape[1]))))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Note: The following command is not required for code above SystemML 
0.14 (master branch dated 05/15/2017 or later)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "!svn --force export https://github.com/apache/systemml/trunk/scripts/nn";
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"train\"></a>\n",
-    "## Train a LeNet-like CNN classifier on the training data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<div style=\"text-align:center\" markdown=\"1\">\n",
-    "![Image of Image to 
Digit](http://www.ommegaonline.org/admin/journalassistance/picturegallery/896.jpg)\n",
-    "MNIST digit recognition â LeNet architecture\n",
-    "</div>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Train a LeNet-like CNN model using SystemML"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "script = \"\"\"\n",
-    "  source(\"nn/examples/mnist_lenet.dml\") as mnist_lenet\n",
-    "\n",
-    "  # Scale images to [-1,1], and one-hot encode the labels\n",
-    "  images = (images / 255) * 2 - 1\n",
-    "  n = nrow(images)\n",
-    "  labels = table(seq(1, n), labels+1, n, 10)\n",
-    "\n",
-    "  # Split into training (55,000 examples) and validation (5,000 
examples)\n",
-    "  X = images[5001:nrow(images),]\n",
-    "  X_val = images[1:5000,]\n",
-    "  y = labels[5001:nrow(images),]\n",
-    "  y_val = labels[1:5000,]\n",
-    "\n",
-    "  # Train the model to produce weights & biases.\n",
-    "  [W1, b1, W2, b2, W3, b3, W4, b4] = mnist_lenet::train(X, y, X_val, 
y_val, C, Hin, Win, epochs)\n",
-    "\"\"\"\n",
-    "out = ('W1', 'b1', 'W2', 'b2', 'W3', 'b3', 'W4', 'b4')\n",
-    "prog = (dml(script).input(images=X_train, labels=y_train, epochs=1, C=1, 
Hin=28, Win=28)\n",
-    "                   .output(*out))\n",
-    "\n",
-    "W1, b1, W2, b2, W3, b3, W4, b4 = ml.execute(prog).get(*out)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Use the trained model to make predictions for the test data, and evaluate 
the quality of the predictions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "script_predict = \"\"\"\n",
-    "  source(\"nn/examples/mnist_lenet.dml\") as mnist_lenet\n",
-    "\n",
-    "  # Scale images to [-1,1]\n",
-    "  X_test = (X_test / 255) * 2 - 1\n",
-    "\n",
-    "  # Predict\n",
-    "  y_prob = mnist_lenet::predict(X_test, C, Hin, Win, W1, b1, W2, b2, W3, 
b3, W4, b4)\n",
-    "  y_pred = rowIndexMax(y_prob) - 1\n",
-    "\"\"\"\n",
-    "prog = (dml(script_predict).input(X_test=X_test, C=1, Hin=28, Win=28, 
W1=W1, b1=b1,\n",
-    "                                  W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, 
b4=b4)\n",
-    "                           .output(\"y_pred\"))\n",
-    "\n",
-    "y_pred = ml.execute(prog).get(\"y_pred\").toNumPy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(metrics.accuracy_score(y_test, y_pred))\n",
-    "print(metrics.classification_report(y_test, y_pred))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "<a id=\"predict\"></a>\n",
-    "## Detect handwritten digits"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Define a function that randomly selects a test image, displays the image, 
and scores it."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "img_size = int(np.sqrt(X_test.shape[1]))\n",
-    "\n",
-    "def displayImage(i):\n",
-    "  image = (X_test[i]).reshape(img_size, img_size).astype(np.uint8)\n",
-    "  imgplot = plt.imshow(image, cmap='gray')   "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "def predictImage(i):\n",
-    "  image = X_test[i].reshape(1, -1)\n",
-    "  out = \"y_pred\"\n",
-    "  prog = (dml(script_predict).input(X_test=image, C=1, Hin=28, Win=28, 
W1=W1, b1=b1,\n",
-    "                                    W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, 
b4=b4)\n",
-    "                             .output(out))\n",
-    "  pred = int(ml.execute(prog).get(out).toNumPy())\n",
-    "  return pred"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "i = np.random.randint(len(X_test))\n",
-    "p = predictImage(i)\n",
-    "\n",
-    "print(\"Image {}\\nPredicted digit: {}\\nActual digit: {}\\nResult: 
{}\".format(\n",
-    "    i, p, int(y_test[i]), p == int(y_test[i])))\n",
-    "\n",
-    "displayImage(i)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "pd.set_option('display.max_columns', 28)\n",
-    "pd.DataFrame((X_test[i]).reshape(img_size, img_size), dtype='uint')"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 + Spark 2.x + SystemML",
-   "language": "python",
-   "name": "pyspark3_2.x"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.1"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

http://git-wip-us.apache.org/repos/asf/systemml/blob/0ef6b924/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb
----------------------------------------------------------------------
diff --git a/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb 
b/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb
deleted file mode 100644
index 681b277..0000000
--- a/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb
+++ /dev/null
@@ -1,599 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Linear Regression Algorithms using Apache SystemML\n",
-    "\n",
-    "This notebook shows:\n",
-    "- Install SystemML Python package and jar file\n",
-    "  - pip\n",
-    "  - SystemML 'Hello World'\n",
-    "- Example 1: Matrix Multiplication\n",
-    "  - SystemML script to generate a random matrix, perform matrix 
multiplication, and compute the sum of the output\n",
-    "  - Examine execution plans, and increase data size to observe changed 
execution plans\n",
-    "- Load diabetes dataset from scikit-learn\n",
-    "- Example 2: Implement three different algorithms to train linear 
regression model\n",
-    "  - Algorithm 1: Linear Regression - Direct Solve (no regularization)\n",
-    "  - Algorithm 2: Linear Regression - Batch Gradient Descent (no 
regularization)\n",
-    "  - Algorithm 3: Linear Regression - Conjugate Gradient (no 
regularization)\n",
-    "- Example 3: Invoke existing SystemML algorithm script LinearRegDS.dml 
using MLContext API\n",
-    "- Example 4: Invoke existing SystemML algorithm using 
scikit-learn/SparkML pipeline like API\n",
-    "- Uninstall/Clean up SystemML Python package and jar file"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### This notebook is supported with SystemML 0.14.0 and above."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "!pip show systemml"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Import SystemML API "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from systemml import MLContext, dml, dmlFromResource\n",
-    "\n",
-    "ml = MLContext(sc)\n",
-    "\n",
-    "print (\"Spark Version:\" + sc.version)\n",
-    "print (\"SystemML Version:\" + ml.version())\n",
-    "print (\"SystemML Built-Time:\"+ ml.buildTime())"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "ml.execute(dml(\"\"\"s = 'Hello World!'\"\"\").output(\"s\")).get(\"s\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Import numpy, sklearn, and define some helper functions"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "import sys, os, glob, subprocess\n",
-    "import matplotlib.pyplot as plt\n",
-    "import numpy as np\n",
-    "from sklearn import datasets\n",
-    "plt.switch_backend('agg')\n",
-    "    \n",
-    "def printLastLogLines(n):\n",
-    "    fname = 
max(glob.iglob(os.sep.join([os.environ[\"HOME\"],'/logs/notebook/kernel-pyspark-*.log'])),
 key=os.path.getctime)\n",
-    "    print(subprocess.check_output(['tail', '-' + str(n), fname]))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Example 1: Matrix Multiplication"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### SystemML script to generate a random matrix, perform matrix 
multiplication, and compute the sum of the output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true,
-    "slideshow": {
-     "slide_type": "-"
-    }
-   },
-   "outputs": [],
-   "source": [
-    "script = \"\"\"\n",
-    "    X = rand(rows=$nr, cols=1000, sparsity=0.5)\n",
-    "    A = t(X) %*% X\n",
-    "    s = sum(A)\n",
-    "\"\"\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "prog = dml(script).input('$nr', 1e5).output('s')\n",
-    "s = ml.execute(prog).get('s')\n",
-    "print (s)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Examine execution plans, and increase data size to observe changed 
execution plans"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true,
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "ml = MLContext(sc)\n",
-    "ml = ml.setStatistics(True)\n",
-    "# re-execute ML program\n",
-    "# printLastLogLines(22)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "prog = dml(script).input('$nr', 1e6).output('s')\n",
-    "out = ml.execute(prog).get('s')\n",
-    "print (out)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "ml = MLContext(sc)\n",
-    "ml = ml.setStatistics(False)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Load diabetes dataset from scikit-learn "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "%matplotlib inline"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "diabetes = datasets.load_diabetes()\n",
-    "diabetes_X = diabetes.data[:, np.newaxis, 2]\n",
-    "diabetes_X_train = diabetes_X[:-20]\n",
-    "diabetes_X_test = diabetes_X[-20:]\n",
-    "diabetes_y_train = diabetes.target[:-20].reshape(-1,1)\n",
-    "diabetes_y_test = diabetes.target[-20:].reshape(-1,1)\n",
-    "\n",
-    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
-    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "diabetes.data.shape"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Example 2: Implement three different algorithms to train linear 
regression model"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "## Algorithm 1: Linear Regression - Direct Solve (no regularization) "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Least squares formulation\n",
-    "w* = argminw ||Xw-y||2 = argminw (y - Xw)'(y - Xw) = argminw (w'(X'X)w - 
w'(X'y))/2\n",
-    "\n",
-    "#### Setting the gradient\n",
-    "dw = (X'X)w - (X'y) to 0, w = (X'X)-1(X' y) = solve(X'X, X'y)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "script = \"\"\"\n",
-    "    # add constant feature to X to model intercept\n",
-    "    X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n",
-    "    A = t(X) %*% X\n",
-    "    b = t(X) %*% y\n",
-    "    w = solve(A, b)\n",
-    "    bias = as.scalar(w[nrow(w),1])\n",
-    "    w = w[1:nrow(w)-1,]\n",
-    "\"\"\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "prog = dml(script).input(X=diabetes_X_train, 
y=diabetes_y_train).output('w', 'bias')\n",
-    "w, bias = ml.execute(prog).get('w','bias')\n",
-    "w = w.toNumPy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
-    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
-    "\n",
-    "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='blue', 
linestyle ='dotted')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "## Algorithm 2: Linear Regression - Batch Gradient Descent (no 
regularization)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "#### Algorithm\n",
-    "`Step 1: Start with an initial point \n",
-    "while(not converged) { \n",
-    "  Step 2: Compute gradient dw. \n",
-    "  Step 3: Compute stepsize alpha.     \n",
-    "  Step 4: Update: wnew = wold + alpha*dw \n",
-    "}`\n",
-    "\n",
-    "#### Gradient formula\n",
-    "`dw = r = (X'X)w - (X'y)`\n",
-    "\n",
-    "#### Step size formula\n",
-    "`Find number alpha to minimize f(w + alpha*r) \n",
-    "alpha = -(r'r)/(r'X'Xr)`\n",
-    "\n",
-    "![Gradient 
Descent](http://blog.datumbox.com/wp-content/uploads/2013/10/gradient-descent.png)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "script = \"\"\"\n",
-    "    # add constant feature to X to model intercepts\n",
-    "    X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n",
-    "    max_iter = 100\n",
-    "    w = matrix(0, rows=ncol(X), cols=1)\n",
-    "    for(i in 1:max_iter){\n",
-    "        XtX = t(X) %*% X\n",
-    "        dw = XtX %*%w - t(X) %*% y\n",
-    "        alpha = -(t(dw) %*% dw) / (t(dw) %*% XtX %*% dw)\n",
-    "        w = w + dw*alpha\n",
-    "    }\n",
-    "    bias = as.scalar(w[nrow(w),1])\n",
-    "    w = w[1:nrow(w)-1,]    \n",
-    "\"\"\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "prog = dml(script).input(X=diabetes_X_train, 
y=diabetes_y_train).output('w').output('bias')\n",
-    "w, bias = ml.execute(prog).get('w', 'bias')\n",
-    "w = w.toNumPy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
-    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
-    "\n",
-    "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='red', 
linestyle ='dashed')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Algorithm 3: Linear Regression - Conjugate Gradient (no regularization)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Problem with gradient descent: Takes very similar directions many 
times\n",
-    "\n",
-    "Solution: Enforce conjugacy\n",
-    "\n",
-    "`Step 1: Start with an initial point \n",
-    "while(not converged) {\n",
-    "   Step 2: Compute gradient dw.\n",
-    "   Step 3: Compute stepsize alpha.\n",
-    "   Step 4: Compute next direction p by enforcing conjugacy with previous 
direction.\n",
-    "   Step 5: Update: w_new = w_old + alpha*p\n",
-    "}`\n",
-    "\n",
-    "![Gradient Descent vs Conjugate 
Gradient](http://i.stack.imgur.com/zh1HH.png)\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "script = \"\"\"\n",
-    "    # add constant feature to X to model intercepts\n",
-    "    X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n",
-    "    m = ncol(X); i = 1; \n",
-    "    max_iter = 20;\n",
-    "    w = matrix (0, rows = m, cols = 1); # initialize weights to 0\n",
-    "    dw = - t(X) %*% y; p = - dw;        # dw = (X'X)w - (X'y)\n",
-    "    norm_r2 = sum (dw ^ 2); \n",
-    "    for(i in 1:max_iter) {\n",
-    "        q = t(X) %*% (X %*% p)\n",
-    "        alpha = norm_r2 / sum (p * q);  # Minimizes f(w - alpha*r)\n",
-    "        w = w + alpha * p;              # update weights\n",
-    "        dw = dw + alpha * q;           \n",
-    "        old_norm_r2 = norm_r2; norm_r2 = sum (dw ^ 2);\n",
-    "        p = -dw + (norm_r2 / old_norm_r2) * p; # next direction - 
conjugacy to previous direction\n",
-    "        i = i + 1;\n",
-    "    }\n",
-    "    bias = as.scalar(w[nrow(w),1])\n",
-    "    w = w[1:nrow(w)-1,]    \n",
-    "\"\"\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "prog = dml(script).input(X=diabetes_X_train, 
y=diabetes_y_train).output('w').output('bias')\n",
-    "w, bias = ml.execute(prog).get('w','bias')\n",
-    "w = w.toNumPy()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "scrolled": false
-   },
-   "outputs": [],
-   "source": [
-    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
-    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
-    "\n",
-    "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='red', 
linestyle ='dashed')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Example 3: Invoke existing SystemML algorithm script LinearRegDS.dml 
using MLContext API"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from subprocess import call\n",
-    "\n",
-    "dirName = os.path.dirname(os.path.realpath(\"~\")) + \"/scripts\"\n",
-    "call([\"mkdir\", \"-p\", dirName])\n",
-    "call([\"wget\", \"-N\", \"-q\", \"-P\", dirName, 
\"https://raw.githubusercontent.com/apache/systemml/master/scripts/algorithms/LinearRegDS.dml\";])\n",
-    "\n",
-    "scriptName = dirName + \"/LinearRegDS.dml\"\n",
-    "dml_script = dmlFromResource(scriptName)\n",
-    "\n",
-    "prog = dml_script.input(X=diabetes_X_train, 
y=diabetes_y_train).input('$icpt',1.0).output('beta_out')\n",
-    "w = ml.execute(prog).get('beta_out')\n",
-    "w = w.toNumPy()\n",
-    "bias=w[1]\n",
-    "print (bias)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
-    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
-    "\n",
-    "plt.plot(diabetes_X_test, (w[0]*diabetes_X_test)+bias, color='red', 
linestyle ='dashed')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Example 4: Invoke existing SystemML algorithm using 
scikit-learn/SparkML pipeline like API"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "*mllearn* API allows a Python programmer to invoke SystemML's algorithms 
using scikit-learn like API as well as Spark's MLPipeline API."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "from pyspark.sql import SQLContext\n",
-    "from systemml.mllearn import LinearRegression\n",
-    "sqlCtx = SQLContext(sc)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "regr = LinearRegression(sqlCtx)\n",
-    "# Train the model using the training sets\n",
-    "regr.fit(diabetes_X_train, diabetes_y_train)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "predictions = regr.predict(diabetes_X_test)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Use the trained model to perform prediction\n",
-    "%matplotlib inline\n",
-    "plt.scatter(diabetes_X_train, diabetes_y_train,  color='black')\n",
-    "plt.scatter(diabetes_X_test, diabetes_y_test,  color='red')\n",
-    "\n",
-    "plt.plot(diabetes_X_test, predictions, color='black')"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 2",
-   "language": "python",
-   "name": "python2"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 2
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython2",
-   "version": "2.7.13"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

systemml git commit: [Minor]: minor additions to notebooks.

Reply via email to