Repository: systemml Updated Branches: refs/heads/master fdc24bb7d -> 0ef6b9246
[Minor]: minor additions to notebooks. Project: http://git-wip-us.apache.org/repos/asf/systemml/repo Commit: http://git-wip-us.apache.org/repos/asf/systemml/commit/0ef6b924 Tree: http://git-wip-us.apache.org/repos/asf/systemml/tree/0ef6b924 Diff: http://git-wip-us.apache.org/repos/asf/systemml/diff/0ef6b924 Branch: refs/heads/master Commit: 0ef6b924612951ccd003e8466fc9a911b098297f Parents: fdc24bb Author: Berthold Reinwald <reinw...@us.ibm.com> Authored: Thu Dec 7 16:16:07 2017 -0800 Committer: Berthold Reinwald <reinw...@us.ibm.com> Committed: Thu Dec 7 16:29:32 2017 -0800 ---------------------------------------------------------------------- .../Deep_Learning_Image_Classification.ipynb | 316 ---------- .../Linear_Regression_Algorithms_Demo.ipynb | 599 ------------------- 2 files changed, 915 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/systemml/blob/0ef6b924/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb ---------------------------------------------------------------------- diff --git a/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb b/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb deleted file mode 100644 index 42f249f..0000000 --- a/samples/jupyter-notebooks/Deep_Learning_Image_Classification.ipynb +++ /dev/null @@ -1,316 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Deep Learning Image Classification using Apache SystemML\n", - "\n", - "This notebook demonstrates how to train a deep learning model on SystemML for the classic [MNIST](http://yann.lecun.com/exdb/mnist/) problem of mapping images of single digit numbers to their corresponding numeric representations, using a classic [LeNet](http://yann.lecun.com/exdb/lenet/)-like convolutional neural network model. See [Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/chap6.html) for more information on neural networks and deep learning.\n", - "\n", - "The downloaded MNIST dataset contains labeled images of handwritten digits, where each example is a 28x28 pixel image of grayscale values in the range [0,255] stretched out as 784 pixels, and each label is one of 10 possible digits in [0,9]. We download 60,000 training examples, and 10,000 test examples, where the images and labels are stored in separate matrices. We then train a SystemML LeNet-like convolutional neural network (i.e. \"convnet\", \"CNN\") model. The resulting trained model has an accuracy of 98.6% on the test dataset.\n", - "\n", - "1. [Download the MNIST data](#download_data)\n", - "1. [Train a CNN classifier for MNIST handwritten digits](#train)\n", - "1. [Detect handwritten Digits](#predict)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "<div style=\"text-align:center\" markdown=\"1\">\n", - "![Image of Image to Digit](https://www.wolfram.com/mathematica/new-in-10/enhanced-image-processing/HTMLImages.en/handwritten-digits-classification/smallthumb_10.gif)\n", - "Mapping images of numbers to numbers\n", - "</div>" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Note: This notebook is supported with SystemML 0.14.0 and above." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "!pip show systemml" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "%matplotlib inline\n", - "\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "import pandas as pd\n", - "from sklearn import datasets\n", - "from sklearn.cross_validation import train_test_split # module deprecated in 0.18\n", - "#from sklearn.model_selection import train_test_split # use this module for >=0.18\n", - "from sklearn import metrics\n", - "from systemml import MLContext, dml" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ml = MLContext(sc)\n", - "print(\"Spark Version: {}\".format(sc.version))\n", - "print(\"SystemML Version: {}\".format(ml.version()))\n", - "print(\"SystemML Built-Time: {}\".format(ml.buildTime()))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "<a id=\"download_data\"></a>\n", - "## Download the MNIST data\n", - "\n", - "Download the [MNIST data from the MLData repository](http://mldata.org/repository/data/viewslug/mnist-original/), and then split and save." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "mnist = datasets.fetch_mldata(\"MNIST Original\")\n", - "\n", - "print(\"MNIST data features: {}\".format(mnist.data.shape))\n", - "print(\"MNIST data labels: {}\".format(mnist.target.shape))\n", - "\n", - "X_train, X_test, y_train, y_test = train_test_split(\n", - " mnist.data, mnist.target.astype(np.uint8).reshape(-1, 1),\n", - " test_size = 10000)\n", - "\n", - "print(\"Training images, labels: {}, {}\".format(X_train.shape, y_train.shape))\n", - "print(\"Testing images, labels: {}, {}\".format(X_test.shape, y_test.shape))\n", - "print(\"Each image is: {0:d}x{0:d} pixels\".format(int(np.sqrt(X_train.shape[1]))))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Note: The following command is not required for code above SystemML 0.14 (master branch dated 05/15/2017 or later)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "!svn --force export https://github.com/apache/systemml/trunk/scripts/nn" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "<a id=\"train\"></a>\n", - "## Train a LeNet-like CNN classifier on the training data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "<div style=\"text-align:center\" markdown=\"1\">\n", - "![Image of Image to Digit](http://www.ommegaonline.org/admin/journalassistance/picturegallery/896.jpg)\n", - "MNIST digit recognition â LeNet architecture\n", - "</div>" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Train a LeNet-like CNN model using SystemML" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "script = \"\"\"\n", - " source(\"nn/examples/mnist_lenet.dml\") as mnist_lenet\n", - "\n", - " # Scale images to [-1,1], and one-hot encode the labels\n", - " images = (images / 255) * 2 - 1\n", - " n = nrow(images)\n", - " labels = table(seq(1, n), labels+1, n, 10)\n", - "\n", - " # Split into training (55,000 examples) and validation (5,000 examples)\n", - " X = images[5001:nrow(images),]\n", - " X_val = images[1:5000,]\n", - " y = labels[5001:nrow(images),]\n", - " y_val = labels[1:5000,]\n", - "\n", - " # Train the model to produce weights & biases.\n", - " [W1, b1, W2, b2, W3, b3, W4, b4] = mnist_lenet::train(X, y, X_val, y_val, C, Hin, Win, epochs)\n", - "\"\"\"\n", - "out = ('W1', 'b1', 'W2', 'b2', 'W3', 'b3', 'W4', 'b4')\n", - "prog = (dml(script).input(images=X_train, labels=y_train, epochs=1, C=1, Hin=28, Win=28)\n", - " .output(*out))\n", - "\n", - "W1, b1, W2, b2, W3, b3, W4, b4 = ml.execute(prog).get(*out)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Use the trained model to make predictions for the test data, and evaluate the quality of the predictions." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "script_predict = \"\"\"\n", - " source(\"nn/examples/mnist_lenet.dml\") as mnist_lenet\n", - "\n", - " # Scale images to [-1,1]\n", - " X_test = (X_test / 255) * 2 - 1\n", - "\n", - " # Predict\n", - " y_prob = mnist_lenet::predict(X_test, C, Hin, Win, W1, b1, W2, b2, W3, b3, W4, b4)\n", - " y_pred = rowIndexMax(y_prob) - 1\n", - "\"\"\"\n", - "prog = (dml(script_predict).input(X_test=X_test, C=1, Hin=28, Win=28, W1=W1, b1=b1,\n", - " W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4)\n", - " .output(\"y_pred\"))\n", - "\n", - "y_pred = ml.execute(prog).get(\"y_pred\").toNumPy()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(metrics.accuracy_score(y_test, y_pred))\n", - "print(metrics.classification_report(y_test, y_pred))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "<a id=\"predict\"></a>\n", - "## Detect handwritten digits" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Define a function that randomly selects a test image, displays the image, and scores it." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "img_size = int(np.sqrt(X_test.shape[1]))\n", - "\n", - "def displayImage(i):\n", - " image = (X_test[i]).reshape(img_size, img_size).astype(np.uint8)\n", - " imgplot = plt.imshow(image, cmap='gray') " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "def predictImage(i):\n", - " image = X_test[i].reshape(1, -1)\n", - " out = \"y_pred\"\n", - " prog = (dml(script_predict).input(X_test=image, C=1, Hin=28, Win=28, W1=W1, b1=b1,\n", - " W2=W2, b2=b2, W3=W3, b3=b3, W4=W4, b4=b4)\n", - " .output(out))\n", - " pred = int(ml.execute(prog).get(out).toNumPy())\n", - " return pred" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "i = np.random.randint(len(X_test))\n", - "p = predictImage(i)\n", - "\n", - "print(\"Image {}\\nPredicted digit: {}\\nActual digit: {}\\nResult: {}\".format(\n", - " i, p, int(y_test[i]), p == int(y_test[i])))\n", - "\n", - "displayImage(i)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "pd.set_option('display.max_columns', 28)\n", - "pd.DataFrame((X_test[i]).reshape(img_size, img_size), dtype='uint')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3 + Spark 2.x + SystemML", - "language": "python", - "name": "pyspark3_2.x" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.1" - } - }, - "nbformat": 4, - "nbformat_minor": 1 -} http://git-wip-us.apache.org/repos/asf/systemml/blob/0ef6b924/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb ---------------------------------------------------------------------- diff --git a/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb b/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb deleted file mode 100644 index 681b277..0000000 --- a/samples/jupyter-notebooks/Linear_Regression_Algorithms_Demo.ipynb +++ /dev/null @@ -1,599 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Linear Regression Algorithms using Apache SystemML\n", - "\n", - "This notebook shows:\n", - "- Install SystemML Python package and jar file\n", - " - pip\n", - " - SystemML 'Hello World'\n", - "- Example 1: Matrix Multiplication\n", - " - SystemML script to generate a random matrix, perform matrix multiplication, and compute the sum of the output\n", - " - Examine execution plans, and increase data size to observe changed execution plans\n", - "- Load diabetes dataset from scikit-learn\n", - "- Example 2: Implement three different algorithms to train linear regression model\n", - " - Algorithm 1: Linear Regression - Direct Solve (no regularization)\n", - " - Algorithm 2: Linear Regression - Batch Gradient Descent (no regularization)\n", - " - Algorithm 3: Linear Regression - Conjugate Gradient (no regularization)\n", - "- Example 3: Invoke existing SystemML algorithm script LinearRegDS.dml using MLContext API\n", - "- Example 4: Invoke existing SystemML algorithm using scikit-learn/SparkML pipeline like API\n", - "- Uninstall/Clean up SystemML Python package and jar file" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### This notebook is supported with SystemML 0.14.0 and above." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "!pip show systemml" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Import SystemML API " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from systemml import MLContext, dml, dmlFromResource\n", - "\n", - "ml = MLContext(sc)\n", - "\n", - "print (\"Spark Version:\" + sc.version)\n", - "print (\"SystemML Version:\" + ml.version())\n", - "print (\"SystemML Built-Time:\"+ ml.buildTime())" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "ml.execute(dml(\"\"\"s = 'Hello World!'\"\"\").output(\"s\")).get(\"s\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Import numpy, sklearn, and define some helper functions" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "import sys, os, glob, subprocess\n", - "import matplotlib.pyplot as plt\n", - "import numpy as np\n", - "from sklearn import datasets\n", - "plt.switch_backend('agg')\n", - " \n", - "def printLastLogLines(n):\n", - " fname = max(glob.iglob(os.sep.join([os.environ[\"HOME\"],'/logs/notebook/kernel-pyspark-*.log'])), key=os.path.getctime)\n", - " print(subprocess.check_output(['tail', '-' + str(n), fname]))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Example 1: Matrix Multiplication" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### SystemML script to generate a random matrix, perform matrix multiplication, and compute the sum of the output" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true, - "slideshow": { - "slide_type": "-" - } - }, - "outputs": [], - "source": [ - "script = \"\"\"\n", - " X = rand(rows=$nr, cols=1000, sparsity=0.5)\n", - " A = t(X) %*% X\n", - " s = sum(A)\n", - "\"\"\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "prog = dml(script).input('$nr', 1e5).output('s')\n", - "s = ml.execute(prog).get('s')\n", - "print (s)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Examine execution plans, and increase data size to observe changed execution plans" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true, - "scrolled": false - }, - "outputs": [], - "source": [ - "ml = MLContext(sc)\n", - "ml = ml.setStatistics(True)\n", - "# re-execute ML program\n", - "# printLastLogLines(22)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "prog = dml(script).input('$nr', 1e6).output('s')\n", - "out = ml.execute(prog).get('s')\n", - "print (out)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "ml = MLContext(sc)\n", - "ml = ml.setStatistics(False)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Load diabetes dataset from scikit-learn " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "%matplotlib inline" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "diabetes = datasets.load_diabetes()\n", - "diabetes_X = diabetes.data[:, np.newaxis, 2]\n", - "diabetes_X_train = diabetes_X[:-20]\n", - "diabetes_X_test = diabetes_X[-20:]\n", - "diabetes_y_train = diabetes.target[:-20].reshape(-1,1)\n", - "diabetes_y_test = diabetes.target[-20:].reshape(-1,1)\n", - "\n", - "plt.scatter(diabetes_X_train, diabetes_y_train, color='black')\n", - "plt.scatter(diabetes_X_test, diabetes_y_test, color='red')" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "diabetes.data.shape" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Example 2: Implement three different algorithms to train linear regression model" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": true - }, - "source": [ - "## Algorithm 1: Linear Regression - Direct Solve (no regularization) " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Least squares formulation\n", - "w* = argminw ||Xw-y||2 = argminw (y - Xw)'(y - Xw) = argminw (w'(X'X)w - w'(X'y))/2\n", - "\n", - "#### Setting the gradient\n", - "dw = (X'X)w - (X'y) to 0, w = (X'X)-1(X' y) = solve(X'X, X'y)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "script = \"\"\"\n", - " # add constant feature to X to model intercept\n", - " X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n", - " A = t(X) %*% X\n", - " b = t(X) %*% y\n", - " w = solve(A, b)\n", - " bias = as.scalar(w[nrow(w),1])\n", - " w = w[1:nrow(w)-1,]\n", - "\"\"\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true, - "scrolled": true - }, - "outputs": [], - "source": [ - "prog = dml(script).input(X=diabetes_X_train, y=diabetes_y_train).output('w', 'bias')\n", - "w, bias = ml.execute(prog).get('w','bias')\n", - "w = w.toNumPy()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": true - }, - "outputs": [], - "source": [ - "plt.scatter(diabetes_X_train, diabetes_y_train, color='black')\n", - "plt.scatter(diabetes_X_test, diabetes_y_test, color='red')\n", - "\n", - "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='blue', linestyle ='dotted')" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "collapsed": true - }, - "source": [ - "## Algorithm 2: Linear Regression - Batch Gradient Descent (no regularization)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "#### Algorithm\n", - "`Step 1: Start with an initial point \n", - "while(not converged) { \n", - " Step 2: Compute gradient dw. \n", - " Step 3: Compute stepsize alpha. \n", - " Step 4: Update: wnew = wold + alpha*dw \n", - "}`\n", - "\n", - "#### Gradient formula\n", - "`dw = r = (X'X)w - (X'y)`\n", - "\n", - "#### Step size formula\n", - "`Find number alpha to minimize f(w + alpha*r) \n", - "alpha = -(r'r)/(r'X'Xr)`\n", - "\n", - "![Gradient Descent](http://blog.datumbox.com/wp-content/uploads/2013/10/gradient-descent.png)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "script = \"\"\"\n", - " # add constant feature to X to model intercepts\n", - " X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n", - " max_iter = 100\n", - " w = matrix(0, rows=ncol(X), cols=1)\n", - " for(i in 1:max_iter){\n", - " XtX = t(X) %*% X\n", - " dw = XtX %*%w - t(X) %*% y\n", - " alpha = -(t(dw) %*% dw) / (t(dw) %*% XtX %*% dw)\n", - " w = w + dw*alpha\n", - " }\n", - " bias = as.scalar(w[nrow(w),1])\n", - " w = w[1:nrow(w)-1,] \n", - "\"\"\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "prog = dml(script).input(X=diabetes_X_train, y=diabetes_y_train).output('w').output('bias')\n", - "w, bias = ml.execute(prog).get('w', 'bias')\n", - "w = w.toNumPy()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "plt.scatter(diabetes_X_train, diabetes_y_train, color='black')\n", - "plt.scatter(diabetes_X_test, diabetes_y_test, color='red')\n", - "\n", - "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='red', linestyle ='dashed')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Algorithm 3: Linear Regression - Conjugate Gradient (no regularization)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Problem with gradient descent: Takes very similar directions many times\n", - "\n", - "Solution: Enforce conjugacy\n", - "\n", - "`Step 1: Start with an initial point \n", - "while(not converged) {\n", - " Step 2: Compute gradient dw.\n", - " Step 3: Compute stepsize alpha.\n", - " Step 4: Compute next direction p by enforcing conjugacy with previous direction.\n", - " Step 5: Update: w_new = w_old + alpha*p\n", - "}`\n", - "\n", - "![Gradient Descent vs Conjugate Gradient](http://i.stack.imgur.com/zh1HH.png)\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "script = \"\"\"\n", - " # add constant feature to X to model intercepts\n", - " X = cbind(X, matrix(1, rows=nrow(X), cols=1))\n", - " m = ncol(X); i = 1; \n", - " max_iter = 20;\n", - " w = matrix (0, rows = m, cols = 1); # initialize weights to 0\n", - " dw = - t(X) %*% y; p = - dw; # dw = (X'X)w - (X'y)\n", - " norm_r2 = sum (dw ^ 2); \n", - " for(i in 1:max_iter) {\n", - " q = t(X) %*% (X %*% p)\n", - " alpha = norm_r2 / sum (p * q); # Minimizes f(w - alpha*r)\n", - " w = w + alpha * p; # update weights\n", - " dw = dw + alpha * q; \n", - " old_norm_r2 = norm_r2; norm_r2 = sum (dw ^ 2);\n", - " p = -dw + (norm_r2 / old_norm_r2) * p; # next direction - conjugacy to previous direction\n", - " i = i + 1;\n", - " }\n", - " bias = as.scalar(w[nrow(w),1])\n", - " w = w[1:nrow(w)-1,] \n", - "\"\"\"" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "prog = dml(script).input(X=diabetes_X_train, y=diabetes_y_train).output('w').output('bias')\n", - "w, bias = ml.execute(prog).get('w','bias')\n", - "w = w.toNumPy()" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "scrolled": false - }, - "outputs": [], - "source": [ - "plt.scatter(diabetes_X_train, diabetes_y_train, color='black')\n", - "plt.scatter(diabetes_X_test, diabetes_y_test, color='red')\n", - "\n", - "plt.plot(diabetes_X_test, (w*diabetes_X_test)+bias, color='red', linestyle ='dashed')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Example 3: Invoke existing SystemML algorithm script LinearRegDS.dml using MLContext API" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "from subprocess import call\n", - "\n", - "dirName = os.path.dirname(os.path.realpath(\"~\")) + \"/scripts\"\n", - "call([\"mkdir\", \"-p\", dirName])\n", - "call([\"wget\", \"-N\", \"-q\", \"-P\", dirName, \"https://raw.githubusercontent.com/apache/systemml/master/scripts/algorithms/LinearRegDS.dml\"])\n", - "\n", - "scriptName = dirName + \"/LinearRegDS.dml\"\n", - "dml_script = dmlFromResource(scriptName)\n", - "\n", - "prog = dml_script.input(X=diabetes_X_train, y=diabetes_y_train).input('$icpt',1.0).output('beta_out')\n", - "w = ml.execute(prog).get('beta_out')\n", - "w = w.toNumPy()\n", - "bias=w[1]\n", - "print (bias)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "plt.scatter(diabetes_X_train, diabetes_y_train, color='black')\n", - "plt.scatter(diabetes_X_test, diabetes_y_test, color='red')\n", - "\n", - "plt.plot(diabetes_X_test, (w[0]*diabetes_X_test)+bias, color='red', linestyle ='dashed')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Example 4: Invoke existing SystemML algorithm using scikit-learn/SparkML pipeline like API" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "*mllearn* API allows a Python programmer to invoke SystemML's algorithms using scikit-learn like API as well as Spark's MLPipeline API." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "from pyspark.sql import SQLContext\n", - "from systemml.mllearn import LinearRegression\n", - "sqlCtx = SQLContext(sc)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "regr = LinearRegression(sqlCtx)\n", - "# Train the model using the training sets\n", - "regr.fit(diabetes_X_train, diabetes_y_train)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "collapsed": true - }, - "outputs": [], - "source": [ - "predictions = regr.predict(diabetes_X_test)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Use the trained model to perform prediction\n", - "%matplotlib inline\n", - "plt.scatter(diabetes_X_train, diabetes_y_train, color='black')\n", - "plt.scatter(diabetes_X_test, diabetes_y_test, color='red')\n", - "\n", - "plt.plot(diabetes_X_test, predictions, color='black')" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 2", - "language": "python", - "name": "python2" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 2 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.13" - } - }, - "nbformat": 4, - "nbformat_minor": 1 -}