[SYSTEMML-1170] Clean Up Python Documentation For Next Release

Cleanup of Python documentation.

Closes #335.


Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
Commit: 
http://git-wip-us.apache.org/repos/asf/incubator-systemml/commit/94cf7c15
Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/tree/94cf7c15
Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/diff/94cf7c15

Branch: refs/heads/gh-pages
Commit: 94cf7c15b161a729f50ffec84e761b343e3ab2f9
Parents: 8268255
Author: Mike Dusenberry <[email protected]>
Authored: Mon Jan 9 14:02:08 2017 -0800
Committer: Mike Dusenberry <[email protected]>
Committed: Mon Jan 9 14:02:08 2017 -0800

----------------------------------------------------------------------
 README.md                            |   3 +-
 beginners-guide-python.md            | 128 ++++++++++++++++++------------
 index.md                             |  13 +--
 spark-mlcontext-programming-guide.md |  66 +++++++--------
 4 files changed, 111 insertions(+), 99 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/94cf7c15/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
index 6906c8d..5a4b175 100644
--- a/README.md
+++ b/README.md
@@ -27,6 +27,7 @@ Jekyll (and optionally Pygments) can be installed on the Mac 
OS in the following
     $ brew install ruby
     $ gem install jekyll
     $ gem install jekyll-redirect-from
+    $ gem install bundler
     $ brew install python
     $ pip install Pygments
     $ gem install pygments.rb
@@ -38,4 +39,4 @@ documentation. From there, you can have Jekyll convert the 
markdown files to HTM
 Jekyll will serve up the generated documentation by default at 
http://127.0.0.1:4000. Modifications
 to *.md files will be converted to HTML and can be viewed in a web browser.
 
-    $ jekyll serve -w
\ No newline at end of file
+    $ jekyll serve -w

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/94cf7c15/beginners-guide-python.md
----------------------------------------------------------------------
diff --git a/beginners-guide-python.md b/beginners-guide-python.md
index c919f3f..8bd957a 100644
--- a/beginners-guide-python.md
+++ b/beginners-guide-python.md
@@ -54,7 +54,8 @@ If you already have an Apache Spark installation, you can 
skip this step.
 /usr/bin/ruby -e "$(curl -fsSL 
https://raw.githubusercontent.com/Homebrew/install/master/install)"
 brew tap caskroom/cask
 brew install Caskroom/cask/java
-brew install apache-spark
+brew tap homebrew/versions
+brew install apache-spark16
 ```
 </div>
 <div data-lang="Linux" markdown="1">
@@ -70,37 +71,60 @@ brew install apache-spark16
 
 ### Install SystemML
 
-We are working towards uploading the python package on pypi. Until then, 
please use following commands: 
+We are working towards uploading the python package on PyPi. Until then, 
please use following
+commands: 
 
+<div class="codetabs">
+<div data-lang="Python 2" markdown="1">
 ```bash
 git checkout https://github.com/apache/incubator-systemml.git
 cd incubator-systemml
 mvn clean package -P distribution
 pip install target/systemml-0.12.0-incubating-SNAPSHOT-python.tgz
 ```
-
-The above commands will install Python package and place the corresponding 
Java binaries (along with algorithms) into the installed location.
-To find the location of the downloaded Java binaries, use the following 
command:
-
+</div>
+<div data-lang="Python 3" markdown="1">
 ```bash
-python -c 'import imp; import os; print 
os.path.join(imp.find_module("systemml")[1], "systemml-java")'
+git checkout https://github.com/apache/incubator-systemml.git
+cd incubator-systemml
+mvn clean package -P distribution
+pip3 install target/systemml-0.12.0-incubating-SNAPSHOT-python.tgz
 ```
+</div>
+</div>
 
-Note: the user is free to either use the prepackaged Java binaries 
-or download them from [SystemML 
website](http://systemml.apache.org/download.html) 
-or build them from the [source](https://github.com/apache/incubator-systemml).
-
+### Uninstall SystemML
 To uninstall SystemML, please use following command:
 
+<div class="codetabs">
+<div data-lang="Python 2" markdown="1">
 ```bash
-pip uninstall systemml-incubating
+pip uninstall systemml
 ```
+</div>
+<div data-lang="Python 3" markdown="1">
+```bash
+pip3 uninstall systemml
+```
+</div>
+</div>
 
 ### Start Pyspark shell
 
+<div class="codetabs">
+<div data-lang="Python 2" markdown="1">
 ```bash
-pyspark --master local[*]
+pyspark
 ```
+</div>
+<div data-lang="Python 3" markdown="1">
+```bash
+PYSPARK_PYTHON=python3 pyspark
+```
+</div>
+</div>
+
+---
 
 ## Matrix operations
 
@@ -118,20 +142,20 @@ m4.sum(axis=1).toNumPy()
 
 Output:
 
-```bash
+```python
 array([[-60.],
        [-60.],
        [-60.]])
 ```
 
 Let us now write a simple script to train [linear 
regression](https://apache.github.io/incubator-systemml/algorithms-regression.html#linear-regression)
 
-model: $ \beta = solve(X^T X, X^T y) $. For simplicity, we will use 
direct-solve method and ignore regularization parameter as well as intercept. 
+model: $ \beta = solve(X^T X, X^T y) $. For simplicity, we will use 
direct-solve method and ignore
+regularization parameter as well as intercept. 
 
 ```python
 import numpy as np
 from sklearn import datasets
 import systemml as sml
-from pyspark.sql import SQLContext
 # Load the diabetes dataset
 diabetes = datasets.load_diabetes()
 # Use only one feature
@@ -158,7 +182,10 @@ Output:
 Residual sum of squares: 25282.12
 ```
 
-We can improve the residual error by adding an intercept and regularization 
parameter. To do so, we will use `mllearn` API described in the next section.
+We can improve the residual error by adding an intercept and regularization 
parameter. To do so, we
+will use `mllearn` API described in the next section.
+
+---
 
 ## Invoke SystemML's algorithms
 
@@ -206,7 +233,7 @@ algorithm on digits datasets.
 
 ```python
 # Scikit-learn way
-from sklearn import datasets, neighbors
+from sklearn import datasets
 from systemml.mllearn import LogisticRegression
 from pyspark.sql import SQLContext
 sqlCtx = SQLContext(sc)
@@ -233,7 +260,7 @@ LogisticRegression score: 0.922222
 To train the above algorithm on larger dataset, we can load the dataset into 
DataFrame and pass it to the `fit` method:
 
 ```python
-from sklearn import datasets, neighbors
+from sklearn import datasets
 from systemml.mllearn import LogisticRegression
 from pyspark.sql import SQLContext
 import pandas as pd
@@ -245,7 +272,7 @@ X_digits = digits.data
 y_digits = digits.target
 n_samples = len(X_digits)
 # Split the data into training/testing sets and convert to PySpark DataFrame
-df_train = sml.convertToLabeledDF(sqlContext, X_digits[:int(.9 * n_samples)], 
y_digits[:int(.9 * n_samples)])
+df_train = sml.convertToLabeledDF(sqlCtx, X_digits[:int(.9 * n_samples)], 
y_digits[:int(.9 * n_samples)])
 X_test = sqlCtx.createDataFrame(pd.DataFrame(X_digits[int(.9 * n_samples):]))
 logistic = LogisticRegression(sqlCtx)
 logistic.fit(df_train)
@@ -274,18 +301,18 @@ from pyspark.ml.feature import HashingTF, Tokenizer
 from pyspark.sql import SQLContext
 sqlCtx = SQLContext(sc)
 training = sqlCtx.createDataFrame([
-    (0L, "a b c d e spark", 1.0),
-    (1L, "b d", 2.0),
-    (2L, "spark f g h", 1.0),
-    (3L, "hadoop mapreduce", 2.0),
-    (4L, "b spark who", 1.0),
-    (5L, "g d a y", 2.0),
-    (6L, "spark fly", 1.0),
-    (7L, "was mapreduce", 2.0),
-    (8L, "e spark program", 1.0),
-    (9L, "a e c l", 2.0),
-    (10L, "spark compile", 1.0),
-    (11L, "hadoop software", 2.0)
+    (0, "a b c d e spark", 1.0),
+    (1, "b d", 2.0),
+    (2, "spark f g h", 1.0),
+    (3, "hadoop mapreduce", 2.0),
+    (4, "b spark who", 1.0),
+    (5, "g d a y", 2.0),
+    (6, "spark fly", 1.0),
+    (7, "was mapreduce", 2.0),
+    (8, "e spark program", 1.0),
+    (9, "a e c l", 2.0),
+    (10, "spark compile", 1.0),
+    (11, "hadoop software", 2.0)
 ], ["id", "text", "label"])
 tokenizer = Tokenizer(inputCol="text", outputCol="words")
 hashingTF = HashingTF(inputCol="words", outputCol="features", numFeatures=20)
@@ -293,10 +320,10 @@ lr = LogisticRegression(sqlCtx)
 pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
 model = pipeline.fit(training)
 test = sqlCtx.createDataFrame([
-    (12L, "spark i j k"),
-    (13L, "l m n"),
-    (14L, "mapreduce spark"),
-    (15L, "apache hadoop")], ["id", "text"])
+    (12, "spark i j k"),
+    (13, "l m n"),
+    (14, "mapreduce spark"),
+    (15, "apache hadoop")], ["id", "text"])
 prediction = model.transform(test)
 prediction.show()
 ```
@@ -304,27 +331,28 @@ prediction.show()
 Output:
 
 ```bash
-+--+---------------+--------------------+--------------------+--------------------+---+----------+
-|id|           text|               words|            features|         
probability| ID|prediction|
-+--+---------------+--------------------+--------------------+--------------------+---+----------+
-|12|    spark i j 
k|ArrayBuffer(spark...|(20,[5,6,7],[2.0,...|[0.99999999999975...|1.0|       1.0|
-|13|          l m n|ArrayBuffer(l, m, 
n)|(20,[8,9,10],[1.0...|[1.37552128844736...|2.0|       2.0|
-|14|mapreduce 
spark|ArrayBuffer(mapre...|(20,[5,10],[1.0,1...|[0.99860290938153...|3.0|       
1.0|
-|15|  apache 
hadoop|ArrayBuffer(apach...|(20,[9,14],[1.0,1...|[5.41688748236143...|4.0|      
 2.0|
-+--+---------------+--------------------+--------------------+--------------------+---+----------+
++-------+---+---------------+------------------+--------------------+--------------------+----------+
+|__INDEX| id|           text|             words|            features|         
probability|prediction|
++-------+---+---------------+------------------+--------------------+--------------------+----------+
+|    1.0| 12|    spark i j k|  [spark, i, j, 
k]|(20,[5,6,7],[2.0,...|[0.99999999999975...|       1.0|
+|    2.0| 13|          l m n|         [l, m, 
n]|(20,[8,9,10],[1.0...|[1.37552128844736...|       2.0|
+|    3.0| 14|mapreduce spark|[mapreduce, 
spark]|(20,[5,10],[1.0,1...|[0.99860290938153...|       1.0|
+|    4.0| 15|  apache hadoop|  [apache, 
hadoop]|(20,[9,14],[1.0,1...|[5.41688748236143...|       2.0|
++-------+---+---------------+------------------+--------------------+--------------------+----------+
 ```
 
+---
+
 ## Invoking DML/PyDML scripts using MLContext
 
 The below example demonstrates how to invoke the algorithm 
[scripts/algorithms/MultiLogReg.dml](https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/MultiLogReg.dml)
 using Python [MLContext 
API](https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide).
 
 ```python
-from sklearn import datasets, neighbors
-from pyspark.sql import DataFrame, SQLContext
+from sklearn import datasets
+from pyspark.sql import SQLContext
 import systemml as sml
 import pandas as pd
-import os, imp
 sqlCtx = SQLContext(sc)
 digits = datasets.load_digits()
 X_digits = digits.data
@@ -334,8 +362,8 @@ n_samples = len(X_digits)
 X_df = sqlCtx.createDataFrame(pd.DataFrame(X_digits[:int(.9 * n_samples)]))
 y_df = sqlCtx.createDataFrame(pd.DataFrame(y_digits[:int(.9 * n_samples)]))
 ml = sml.MLContext(sc)
-# Get the path of MultiLogReg.dml
-scriptPath = os.path.join(imp.find_module("systemml")[1], 'systemml-java', 
'scripts', 'algorithms', 'MultiLogReg.dml')
-script = sml.dml(scriptPath).input(X=X_df, Y_vec=y_df).output("B_out")
+# Run the MultiLogReg.dml script at the given URL
+scriptUrl = 
"https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/algorithms/MultiLogReg.dml";
+script = sml.dml(scriptUrl).input(X=X_df, Y_vec=y_df).output("B_out")
 beta = ml.execute(script).get('B_out').toNumPy()
 ```

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/94cf7c15/index.md
----------------------------------------------------------------------
diff --git a/index.md b/index.md
index 6b91654..fe8361a 100644
--- a/index.md
+++ b/index.md
@@ -42,13 +42,11 @@ To download SystemML, visit the 
[downloads](http://systemml.apache.org/download)
 
 ## Running SystemML
 
+* **[Beginner's Guide For Python Users](beginners-guide-python)** - Beginner's 
Guide for Python users.
 * **[Spark MLContext](spark-mlcontext-programming-guide)** - Spark MLContext 
is a programmatic API
 for running SystemML from Spark via Scala, Python, or Java.
-  * See the [Spark MLContext Programming 
Guide](spark-mlcontext-programming-guide) with the
-  following examples:
-    * [**Spark Shell 
(Scala)**](spark-mlcontext-programming-guide#spark-shell-example---new-api)
-    * [**Zeppelin Notebook 
(Scala)**](spark-mlcontext-programming-guide#zeppelin-notebook-example---linear-regression-algorithm---old-api)
-    * [**Jupyter Notebook 
(PySpark)**](spark-mlcontext-programming-guide#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization---old-api)
+  * [**Spark Shell Example 
(Scala)**](spark-mlcontext-programming-guide#spark-shell-example)
+  * [**Jupyter Notebook Example 
(PySpark)**](spark-mlcontext-programming-guide#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization)
 * **[Spark Batch](spark-batch-mode)** - Algorithms are automatically optimized 
to run across Spark clusters.
   * See [Invoking SystemML in Spark Batch Mode](spark-batch-mode) for detailed 
information.
 * **[Hadoop Batch](hadoop-batch-mode)** - Algorithms are automatically 
optimized when distributed across Hadoop clusters.
@@ -62,16 +60,13 @@ machine in R-like and Python-like declarative languages.
 
 ## Language Guides
 
+* [Python API Reference](python-reference) - API Reference Guide for Python 
users.
 * [DML Language Reference](dml-language-reference) -
 DML is a high-level R-like declarative language for machine learning.
 * **PyDML Language Reference** **(Coming Soon)** -
 PyDML is a high-level Python-like declarative language for machine learning.
 * [Beginner's Guide to DML and PyDML](beginners-guide-to-dml-and-pydml) -
 An introduction to the basics of DML and PyDML.
-* [Beginner's Guide for Python users](beginners-guide-python) -
-Beginner's Guide for Python users.
-* [Reference Guide for Python users](python-reference) -
-Reference Guide for Python users.
 
 ## ML Algorithms
 

http://git-wip-us.apache.org/repos/asf/incubator-systemml/blob/94cf7c15/spark-mlcontext-programming-guide.md
----------------------------------------------------------------------
diff --git a/spark-mlcontext-programming-guide.md 
b/spark-mlcontext-programming-guide.md
index fbc8f5b..dcaa125 100644
--- a/spark-mlcontext-programming-guide.md
+++ b/spark-mlcontext-programming-guide.md
@@ -35,14 +35,10 @@ such as Scala, Java, and Python. As a result, it offers a 
convenient way to inte
 Shell and from Notebooks such as Jupyter and Zeppelin.
 
 **NOTE: A new MLContext API has been redesigned for future SystemML releases. 
The old API is available
-in all versions of SystemML but will be deprecated and removed, so please 
migrate to the new API.**
+in previous versions of SystemML but is deprecated and will be removed soon, 
so please migrate to the new API.**
 
 
-# Spark Shell Example - NEW API
-
-**NOTE: The new MLContext API will be available in future SystemML releases. 
It can be used
-by building the project using Maven ('mvn clean package', or 'mvn clean 
package -P distribution').
-For SystemML version 0.10.0 and earlier, please see the documentation 
regarding the old API.**
+# Spark Shell Example
 
 ## Start Spark Shell with SystemML
 
@@ -1644,25 +1640,8 @@ scala> for (i <- 1 to 5) {
 
 # Jupyter (PySpark) Notebook Example - Poisson Nonnegative Matrix Factorization
 
-Similar to the Scala API, SystemML also provides a Python MLContext API.  In 
addition to the
-regular `SystemML.jar` file, you'll need to install the Python API as follows:
-
-  * Latest release:
-    * Python 2:
-
-      ```
-      pip install systemml
-      # Bleeding edge: pip install 
git+git://github.com/apache/incubator-systemml.git#subdirectory=src/main/python
-      ```
-
-    * Python 3:
-
-      ```
-      pip3 install systemml
-      # Bleeding edge: pip3 install 
git+git://github.com/apache/incubator-systemml.git#subdirectory=src/main/python
-      ```
-  * Don't forget to download the `SystemML.jar` file, which can be found in 
the latest release, or
-  in a nightly build.
+Similar to the Scala API, SystemML also provides a Python MLContext API.  
Before usage, you'll need
+**[to install it first](beginners-guide-python#download--setup)**.
 
 Here, we'll explore the use of SystemML via PySpark in a [Jupyter 
notebook](http://jupyter.org/).
 This Jupyter notebook example can be nicely viewed in a rendered state
@@ -1671,17 +1650,18 @@ and can be [downloaded 
here](https://raw.githubusercontent.com/apache/incubator-
 
 From the directory with the downloaded notebook, start Jupyter with PySpark:
 
-  * Python 2:
-
-    ```
-    PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" 
pyspark --master local[*] --driver-class-path SystemML.jar --jars SystemML.jar
-    ```
-
-  * Python 3:
-
-    ```
-    PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=jupyter 
PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark --master local[*] 
--driver-class-path SystemML.jar --jars SystemML.jar
-    ```
+<div class="codetabs">
+<div data-lang="Python 2" markdown="1">
+```bash
+PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
+```
+</div>
+<div data-lang="Python 3" markdown="1">
+```bash
+PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=jupyter 
PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark
+```
+</div>
+</div>
 
 This will open Jupyter in a browser:
 
@@ -1797,6 +1777,9 @@ plt.title('PNMF Training Loss')
 
 # Spark Shell Example - OLD API
 
+### ** **NOTE: This API is old and has been deprecated.** **
+**Please use the [new MLContext 
API](spark-mlcontext-programming-guide#spark-shell-example) instead.**
+
 ## Start Spark Shell with SystemML
 
 To use SystemML with the Spark Shell, the SystemML jar can be referenced using 
the Spark Shell's `--jars` option.
@@ -2216,11 +2199,13 @@ val (min, max, mean) = minMaxMean(sysMlMatrix, numRows, 
numCols, ml)
 
 </div>
 
-
-* * *
+---
 
 # Zeppelin Notebook Example - Linear Regression Algorithm - OLD API
 
+### ** **NOTE: This API is old and has been deprecated.** **
+**Please use the [new MLContext 
API](spark-mlcontext-programming-guide#spark-shell-example) instead.**
+
 Next, we'll consider an example of a SystemML linear regression algorithm run 
from Spark through an Apache Zeppelin notebook.
 Instructions to clone and build Zeppelin can be found at the [GitHub Apache 
Zeppelin](https://github.com/apache/incubator-zeppelin)
 site. This example also will look at the Spark ML linear regression algorithm.
@@ -2701,10 +2686,13 @@ Training time per iter: 0.2334166666666667 seconds
 {% endhighlight %}
 
 
-* * *
+---
 
 # Jupyter (PySpark) Notebook Example - Poisson Nonnegative Matrix 
Factorization - OLD API
 
+### ** **NOTE: This API is old and has been deprecated.** **
+**Please use the [new MLContext 
API](spark-mlcontext-programming-guide#jupyter-pyspark-notebook-example---poisson-nonnegative-matrix-factorization)
 instead.**
+
 Here, we'll explore the use of SystemML via PySpark in a [Jupyter 
notebook](http://jupyter.org/).
 This Jupyter notebook example can be nicely viewed in a rendered state
 [on 
GitHub](https://github.com/apache/incubator-systemml/blob/master/samples/jupyter-notebooks/SystemML-PySpark-Recommendation-Demo.ipynb),

Reply via email to