Repository: mahout
Updated Branches:
  refs/heads/website 2a8139dcf -> bcdad32ac


http://git-wip-us.apache.org/repos/asf/mahout/blob/a60c79e7/website/docs/0.13.0/tutorials/how-to-build-an-app.md
----------------------------------------------------------------------
diff --git a/website/docs/0.13.0/tutorials/how-to-build-an-app.md 
b/website/docs/0.13.0/tutorials/how-to-build-an-app.md
new file mode 100644
index 0000000..9cf624b
--- /dev/null
+++ b/website/docs/0.13.0/tutorials/how-to-build-an-app.md
@@ -0,0 +1,255 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara In Core
+permalink: /docs/0.13.0/tutorials/build-app
+---
+#How to create and App using Mahout
+
+This is an example of how to create a simple app using Mahout as a Library. 
The source is available on Github in the [3-input-cooc 
project](https://github.com/pferrel/3-input-cooc) with more explanation about 
what it does (has to do with collaborative filtering). For this tutorial we'll 
concentrate on the app rather than the data science.
+
+The app reads in three user-item interactions types and creats indicators for 
them using cooccurrence and cross-cooccurrence. The indicators will be written 
to text files in a format ready for search engine indexing in search engine 
based recommender.
+
+##Setup
+In order to build and run the CooccurrenceDriver you need to install the 
following:
+
+* Install the Java 7 JDK from Oracle. Mac users look here: [Java SE 
Development Kit 
7u72](http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html).
+* Install sbt (simple build tool) 0.13.x for 
[Mac](http://www.scala-sbt.org/release/tutorial/Installing-sbt-on-Mac.html), 
[Linux](http://www.scala-sbt.org/release/tutorial/Installing-sbt-on-Linux.html) 
or [manual 
instalation](http://www.scala-sbt.org/release/tutorial/Manual-Installation.html).
+* Install [Spark 
1.1.1](https://spark.apache.org/docs/1.1.1/spark-standalone.html). Don't forget 
to setup SPARK_HOME
+* Install [Mahout 0.10.0](http://mahout.apache.org/general/downloads.html). 
Don't forget to setup MAHOUT_HOME and MAHOUT_LOCAL
+
+Why install if you are only using them as a library? Certain binaries and 
scripts are required by the libraries to get information about the environment 
like discovering where jars are located.
+
+Spark requires a set of jars on the classpath for the client side part of an 
app and another set of jars must be passed to the Spark Context for running 
distributed code. The example should discover all the neccessary classes 
automatically.
+
+##Application
+Using Mahout as a library in an application will require a little Scala code. 
Scala has an App trait so we'll create an object, which inherits from ```App```
+
+
+    object CooccurrenceDriver extends App {
+    }
+    
+
+This will look a little different than Java since ```App``` does delayed 
initialization, which causes the body to be executed when the App is launched, 
just as in Java you would create a main method.
+
+Before we can execute something on Spark we'll need to create a context. We 
could use raw Spark calls here but default values are setup for a Mahout 
context by using the Mahout helper function.
+
+    implicit val mc = mahoutSparkContext(masterUrl = "local", 
+      appName = "CooccurrenceDriver")
+    
+We need to read in three files containing different interaction types. The 
files will each be read into a Mahout IndexedDataset. This allows us to 
preserve application-specific user and item IDs throughout the calculations.
+
+For example, here is data/purchase.csv:
+
+    u1,iphone
+    u1,ipad
+    u2,nexus
+    u2,galaxy
+    u3,surface
+    u4,iphone
+    u4,galaxy
+
+Mahout has a helper function that reads the text delimited files  
SparkEngine.indexedDatasetDFSReadElements. The function reads single element 
tuples (user-id,item-id) in a distributed way to create the IndexedDataset. 
Distributed Row Matrices (DRM) and Vectors are important data types supplied by 
Mahout and IndexedDataset is like a very lightweight Dataframe in R, it wraps a 
DRM with HashBiMaps for row and column IDs. 
+
+One important thing to note about this example is that we read in all datasets 
before we adjust the number of rows in them to match the total number of users 
in the data. This is so the math works out [(A'A, A'B, 
A'C)](http://mahout.apache.org/users/algorithms/intro-cooccurrence-spark.html) 
even if some users took one action but not another there must be the same 
number of rows in all matrices.
+
+    /**
+     * Read files of element tuples and create IndexedDatasets one per action. 
These 
+     * share a userID BiMap but have their own itemID BiMaps
+     */
+    def readActions(actionInput: Array[(String, String)]): Array[(String, 
IndexedDataset)] = {
+      var actions = Array[(String, IndexedDataset)]()
+
+      val userDictionary: BiMap[String, Int] = HashBiMap.create()
+
+      // The first action named in the sequence is the "primary" action and 
+      // begins to fill up the user dictionary
+      for ( actionDescription <- actionInput ) {// grab the path to actions
+        val action: IndexedDataset = SparkEngine.indexedDatasetDFSReadElements(
+          actionDescription._2,
+          schema = DefaultIndexedDatasetElementReadSchema,
+          existingRowIDs = userDictionary)
+        userDictionary.putAll(action.rowIDs)
+        // put the name in the tuple with the indexedDataset
+        actions = actions :+ (actionDescription._1, action) 
+      }
+
+      // After all actions are read in the userDictonary will contain every 
user seen, 
+      // even if they may not have taken all actions . Now we adjust the row 
rank of 
+      // all IndxedDataset's to have this number of rows
+      // Note: this is very important or the cooccurrence calc may fail
+      val numUsers = userDictionary.size() // one more than the cardinality
+
+      val resizedNameActionPairs = actions.map { a =>
+        //resize the matrix by, in effect by adding empty rows
+        val resizedMatrix = a._2.create(a._2.matrix, userDictionary, 
a._2.columnIDs).newRowCardinality(numUsers)
+        (a._1, resizedMatrix) // return the Tuple of (name, IndexedDataset)
+      }
+      resizedNameActionPairs // return the array of Tuples
+    }
+
+
+Now that we have the data read in we can perform the cooccurrence calculation.
+
+    // actions.map creates an array of just the IndeedDatasets
+    val indicatorMatrices = SimilarityAnalysis.cooccurrencesIDSs(
+      actions.map(a => a._2)) 
+
+All we need to do now is write the indicators.
+
+    // zip a pair of arrays into an array of pairs, reattaching the action 
names
+    val indicatorDescriptions = actions.map(a => a._1).zip(indicatorMatrices)
+    writeIndicators(indicatorDescriptions)
+
+
+The ```writeIndicators``` method uses the default write function 
```dfsWrite```.
+
+    /**
+     * Write indicatorMatrices to the output dir in the default format
+     * for indexing by a search engine.
+     */
+    def writeIndicators( indicators: Array[(String, IndexedDataset)]) = {
+      for (indicator <- indicators ) {
+        // create a name based on the type of indicator
+        val indicatorDir = OutputPath + indicator._1
+        indicator._2.dfsWrite(
+          indicatorDir,
+          // Schema tells the writer to omit LLR strengths 
+          // and format for search engine indexing
+          IndexedDatasetWriteBooleanSchema) 
+      }
+    }
+ 
+
+See the Github project for the full source. Now we create a build.sbt to build 
the example. 
+
+    name := "cooccurrence-driver"
+
+    organization := "com.finderbots"
+
+    version := "0.1"
+
+    scalaVersion := "2.10.4"
+
+    val sparkVersion = "1.1.1"
+
+    libraryDependencies ++= Seq(
+      "log4j" % "log4j" % "1.2.17",
+      // Mahout's Spark code
+      "commons-io" % "commons-io" % "2.4",
+      "org.apache.mahout" % "mahout-math-scala_2.10" % "0.10.0",
+      "org.apache.mahout" % "mahout-spark_2.10" % "0.10.0",
+      "org.apache.mahout" % "mahout-math" % "0.10.0",
+      "org.apache.mahout" % "mahout-hdfs" % "0.10.0",
+      // Google collections, AKA Guava
+      "com.google.guava" % "guava" % "16.0")
+
+    resolvers += "typesafe repo" at " 
http://repo.typesafe.com/typesafe/releases/";
+
+    resolvers += Resolver.mavenLocal
+
+    packSettings
+
+    packMain := Map(
+      "cooc" -> "CooccurrenceDriver")
+
+
+##Build
+Building the examples from project's root folder:
+
+    $ sbt pack
+
+This will automatically set up some launcher scripts for the driver. To run 
execute
+
+    $ target/pack/bin/cooc
+    
+The driver will execute in Spark standalone mode and put the data in 
/path/to/3-input-cooc/data/indicators/*indicator-type*
+
+##Using a Debugger
+To build and run this example in a debugger like IntelliJ IDEA. Install from 
the IntelliJ site and add the Scala plugin.
+
+Open IDEA and go to the menu File->New->Project from existing 
sources->SBT->/path/to/3-input-cooc. This will create an IDEA project from 
```build.sbt``` in the root directory.
+
+At this point you may create a "Debug Configuration" to run. In the menu 
choose Run->Edit Configurations. Under "Default" choose "Application". In the 
dialog hit the elipsis button "..." to the right of "Environment Variables" and 
fill in your versions of JAVA_HOME, SPARK_HOME, and MAHOUT_HOME. In 
configuration editor under "Use classpath from" choose root-3-input-cooc 
module. 
+
+![image](http://mahout.apache.org/images/debug-config.png)
+
+Now choose "Application" in the left pane and hit the plus sign "+". give the 
config a name and hit the elipsis button to the right of the "Main class" field 
as shown.
+
+![image](http://mahout.apache.org/images/debug-config-2.png)
+
+
+After setting breakpoints you are now ready to debug the configuration. Go to 
the Run->Debug... menu and pick your configuration. This will execute using a 
local standalone instance of Spark.
+
+##The Mahout Shell
+
+For small script-like apps you may wish to use the Mahout shell. It is a Scala 
REPL type interactive shell built on the Spark shell with Mahout-Samsara 
extensions.
+
+To make the CooccurrenceDriver.scala into a script make the following changes:
+
+* You won't need the context, since it is created when the shell is launched, 
comment that line out.
+* Replace the logger.info lines with println
+* Remove the package info since it's not needed, this will produce the file in 
```path/to/3-input-cooc/bin/CooccurrenceDriver.mscala```. 
+
+Note the extension ```.mscala``` to indicate we are using Mahout's scala 
extensions for math, otherwise known as 
[Mahout-Samsara](http://mahout.apache.org/users/environment/out-of-core-reference.html)
+
+To run the code make sure the output does not exist already
+
+    $ rm -r /path/to/3-input-cooc/data/indicators
+    
+Launch the Mahout + Spark shell:
+
+    $ mahout spark-shell
+    
+You'll see the Mahout splash:
+
+    MAHOUT_LOCAL is set, so we don't add HADOOP_CONF_DIR to classpath.
+
+                         _                 _
+             _ __ ___   __ _| |__   ___  _   _| |_
+            | '_ ` _ \ / _` | '_ \ / _ \| | | | __|
+            | | | | | | (_| | | | | (_) | |_| | |_
+            |_| |_| |_|\__,_|_| |_|\___/ \__,_|\__|  version 0.10.0
+
+      
+    Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 
1.7.0_72)
+    Type in expressions to have them evaluated.
+    Type :help for more information.
+    15/04/26 09:30:48 WARN NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
+    Created spark context..
+    Mahout distributed context is available as "implicit val sdc".
+    mahout> 
+
+To load the driver type:
+
+    mahout> :load /path/to/3-input-cooc/bin/CooccurrenceDriver.mscala
+    Loading ./bin/CooccurrenceDriver.mscala...
+    import com.google.common.collect.{HashBiMap, BiMap}
+    import org.apache.log4j.Logger
+    import org.apache.mahout.math.cf.SimilarityAnalysis
+    import org.apache.mahout.math.indexeddataset._
+    import org.apache.mahout.sparkbindings._
+    import scala.collection.immutable.HashMap
+    defined module CooccurrenceDriver
+    mahout> 
+
+To run the driver type:
+
+    mahout> CooccurrenceDriver.main(args = Array(""))
+    
+You'll get some stats printed:
+
+    Total number of users for all actions = 5
+    purchase indicator matrix:
+      Number of rows for matrix = 4
+      Number of columns for matrix = 5
+      Number of rows after resize = 5
+    view indicator matrix:
+      Number of rows for matrix = 4
+      Number of columns for matrix = 5
+      Number of rows after resize = 5
+    category indicator matrix:
+      Number of rows for matrix = 5
+      Number of columns for matrix = 7
+      Number of rows after resize = 5
+    
+If you look in ```path/to/3-input-cooc/data/indicators``` you should find 
folders containing the indicator matrices.

http://git-wip-us.apache.org/repos/asf/mahout/blob/a60c79e7/website/docs/0.13.0/tutorials/play-with-shell.md
----------------------------------------------------------------------
diff --git a/website/docs/0.13.0/tutorials/play-with-shell.md 
b/website/docs/0.13.0/tutorials/play-with-shell.md
new file mode 100644
index 0000000..0c88839
--- /dev/null
+++ b/website/docs/0.13.0/tutorials/play-with-shell.md
@@ -0,0 +1,198 @@
+---
+layout: mahoutdoc
+title: Mahout Samsara In Core
+permalink: /docs/0.13.0/tutorials/samsara-spark-shell
+---
+# Playing with Mahout's Spark Shell 
+
+This tutorial will show you how to play with Mahout's scala DSL for linear 
algebra and its Spark shell. **Please keep in mind that this code is still in a 
very early experimental stage**.
+
+_(Edited for 0.10.2)_
+
+## Intro
+
+We'll use an excerpt of a publicly available [dataset about 
cereals](http://lib.stat.cmu.edu/DASL/Datafiles/Cereals.html). The dataset 
tells the protein, fat, carbohydrate and sugars (in milligrams) contained in a 
set of cereals, as well as a customer rating for the cereals. Our aim for this 
example is to fit a linear model which infers the customer rating from the 
ingredients.
+
+
+Name                    | protein | fat | carbo | sugars | rating
+:-----------------------|:--------|:----|:------|:-------|:---------
+Apple Cinnamon Cheerios | 2       | 2   | 10.5  | 10     | 29.509541
+Cap'n'Crunch            | 1       | 2   | 12    | 12     | 18.042851  
+Cocoa Puffs             | 1       | 1   | 12    | 13     | 22.736446
+Froot Loops             | 2       |    1   | 11    | 13     | 32.207582  
+Honey Graham Ohs        | 1       |    2   | 12    | 11     | 21.871292
+Wheaties Honey Gold     | 2       | 1   | 16    |  8     | 36.187559  
+Cheerios                | 6       |    2   | 17    |  1     | 50.764999
+Clusters                | 3       |    2   | 13    |  7     | 40.400208
+Great Grains Pecan      | 3       | 3   | 13    |  4     | 45.811716  
+
+
+## Installing Mahout & Spark on your local machine
+
+We describe how to do a quick toy setup of Spark & Mahout on your local 
machine, so that you can run this example and play with the shell. 
+
+ 1. Download [Apache Spark 
1.6.2](http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz) and 
unpack the archive file
+ 1. Change to the directory where you unpacked Spark and type ```sbt/sbt 
assembly``` to build it
+ 1. Create a directory for Mahout somewhere on your machine, change to there 
and checkout the master branch of Apache Mahout from GitHub ```git clone 
https://github.com/apache/mahout mahout```
+ 1. Change to the ```mahout``` directory and build mahout using ```mvn 
-DskipTests clean install```
+ 
+## Starting Mahout's Spark shell
+
+ 1. Goto the directory where you unpacked Spark and type 
```sbin/start-all.sh``` to locally start Spark
+ 1. Open a browser, point it to 
[http://localhost:8080/](http://localhost:8080/) to check whether Spark 
successfully started. Copy the url of the spark master at the top of the page 
(it starts with **spark://**)
+ 1. Define the following environment variables: <pre class="codehilite">export 
MAHOUT_HOME=[directory into which you checked out Mahout]
+export SPARK_HOME=[directory where you unpacked Spark]
+export MASTER=[url of the Spark master]
+</pre>
+ 1. Finally, change to the directory where you unpacked Mahout and type 
```bin/mahout spark-shell```, 
+you should see the shell starting and get the prompt ```mahout> ```. Check 
+[FAQ](http://mahout.apache.org/users/sparkbindings/faq.html) for further 
troubleshooting.
+
+## Implementation
+
+We'll use the shell to interactively play with the data and incrementally 
implement a simple [linear 
regression](https://en.wikipedia.org/wiki/Linear_regression) algorithm. Let's 
first load the dataset. Usually, we wouldn't need Mahout unless we processed a 
large dataset stored in a distributed filesystem. But for the sake of this 
example, we'll use our tiny toy dataset and "pretend" it was too big to fit 
onto a single machine.
+
+*Note: You can incrementally follow the example by copy-and-pasting the code 
into your running Mahout shell.*
+
+Mahout's linear algebra DSL has an abstraction called *DistributedRowMatrix 
(DRM)* which models a matrix that is partitioned by rows and stored in the 
memory of a cluster of machines. We use ```dense()``` to create a dense 
in-memory matrix from our toy dataset and use ```drmParallelize``` to load it 
into the cluster, "mimicking" a large, partitioned dataset.
+
+<div class="codehilite"><pre>
+val drmData = drmParallelize(dense(
+  (2, 2, 10.5, 10, 29.509541),  // Apple Cinnamon Cheerios
+  (1, 2, 12,   12, 18.042851),  // Cap'n'Crunch
+  (1, 1, 12,   13, 22.736446),  // Cocoa Puffs
+  (2, 1, 11,   13, 32.207582),  // Froot Loops
+  (1, 2, 12,   11, 21.871292),  // Honey Graham Ohs
+  (2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold
+  (6, 2, 17,   1,  50.764999),  // Cheerios
+  (3, 2, 13,   7,  40.400208),  // Clusters
+  (3, 3, 13,   4,  45.811716)), // Great Grains Pecan
+  numPartitions = 2);
+</pre></div>
+
+Have a look at this matrix. The first four columns represent the ingredients 
+(our features) and the last column (the rating) is the target variable for 
+our regression. [Linear 
regression](https://en.wikipedia.org/wiki/Linear_regression) 
+assumes that the **target variable** `\(\mathbf{y}\)` is generated by the 
+linear combination of **the feature matrix** `\(\mathbf{X}\)` with the 
+**parameter vector** `\(\boldsymbol{\beta}\)` plus the
+ **noise** `\(\boldsymbol{\varepsilon}\)`, summarized in the formula 
+`\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)`. 
+Our goal is to find an estimate of the parameter vector 
+`\(\boldsymbol{\beta}\)` that explains the data very well.
+
+As a first step, we extract `\(\mathbf{X}\)` and `\(\mathbf{y}\)` from our 
data matrix. We get *X* by slicing: we take all rows (denoted by ```::```) and 
the first four columns, which have the ingredients in milligrams as content. 
Note that the result is again a DRM. The shell will not execute this code yet, 
it saves the history of operations and defers the execution until we really 
access a result. **Mahout's DSL automatically optimizes and parallelizes all 
operations on DRMs and runs them on Apache Spark.**
+
+<div class="codehilite"><pre>
+val drmX = drmData(::, 0 until 4)
+</pre></div>
+
+Next, we extract the target variable vector *y*, the fifth column of the data 
matrix. We assume this one fits into our driver machine, so we fetch it into 
memory using ```collect```:
+
+<div class="codehilite"><pre>
+val y = drmData.collect(::, 4)
+</pre></div>
+
+Now we are ready to think about a mathematical way to estimate the parameter 
vector *β*. A simple textbook approach is [ordinary least squares 
(OLS)](https://en.wikipedia.org/wiki/Ordinary_least_squares), which minimizes 
the sum of residual squares between the true target variable and the prediction 
of the target variable. In OLS, there is even a closed form expression for 
estimating `\(\boldsymbol{\beta}\)` as 
+`\(\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{X}^{\top}\mathbf{y}\)`.
+
+The first thing which we compute for this is  
`\(\mathbf{X}^{\top}\mathbf{X}\)`. The code for doing this in Mahout's scala 
DSL maps directly to the mathematical formula. The operation ```.t()``` 
transposes a matrix and analogous to R ```%*%``` denotes matrix multiplication.
+
+<div class="codehilite"><pre>
+val drmXtX = drmX.t %*% drmX
+</pre></div>
+
+The same is true for computing `\(\mathbf{X}^{\top}\mathbf{y}\)`. We can 
simply type the math in scala expressions into the shell. Here, *X* lives in 
the cluster, while is *y* in the memory of the driver, and the result is a DRM 
again.
+<div class="codehilite"><pre>
+val drmXty = drmX.t %*% y
+</pre></div>
+
+We're nearly done. The next step we take is to fetch 
`\(\mathbf{X}^{\top}\mathbf{X}\)` and 
+`\(\mathbf{X}^{\top}\mathbf{y}\)` into the memory of our driver machine (we 
are targeting 
+features matrices that are tall and skinny , 
+so we can assume that `\(\mathbf{X}^{\top}\mathbf{X}\)` is small enough 
+to fit in). Then, we provide them to an in-memory solver (Mahout provides 
+the an analog to R's ```solve()``` for that) which computes ```beta```, our 
+OLS estimate of the parameter vector `\(\boldsymbol{\beta}\)`.
+
+<div class="codehilite"><pre>
+val XtX = drmXtX.collect
+val Xty = drmXty.collect(::, 0)
+
+val beta = solve(XtX, Xty)
+</pre></div>
+
+That's it! We have a implemented a distributed linear regression algorithm 
+on Apache Spark. I hope you agree that we didn't have to worry a lot about 
+parallelization and distributed systems. The goal of Mahout's linear algebra 
+DSL is to abstract away the ugliness of programming a distributed system 
+as much as possible, while still retaining decent performance and 
+scalability.
+
+We can now check how well our model fits its training data. 
+First, we multiply the feature matrix `\(\mathbf{X}\)` by our estimate of 
+`\(\boldsymbol{\beta}\)`. Then, we look at the difference (via L2-norm) of 
+the target variable `\(\mathbf{y}\)` to the fitted target variable:
+
+<div class="codehilite"><pre>
+val yFitted = (drmX %*% beta).collect(::, 0)
+(y - yFitted).norm(2)
+</pre></div>
+
+We hope that we could show that Mahout's shell allows people to interactively 
and incrementally write algorithms. We have entered a lot of individual 
commands, one-by-one, until we got the desired results. We can now refactor a 
little by wrapping our statements into easy-to-use functions. The definition of 
functions follows standard scala syntax. 
+
+We put all the commands for ordinary least squares into a function ```ols```. 
+
+<div class="codehilite"><pre>
+def ols(drmX: DrmLike[Int], y: Vector) = 
+  solve(drmX.t %*% drmX, drmX.t %*% y)(::, 0)
+
+</pre></div>
+
+Note that DSL declares implicit `collect` if coersion rules require an in-core 
argument. Hence, we can simply
+skip explicit `collect`s. 
+
+Next, we define a function ```goodnessOfFit``` that tells how well a model 
fits the target variable:
+
+<div class="codehilite"><pre>
+def goodnessOfFit(drmX: DrmLike[Int], beta: Vector, y: Vector) = {
+  val fittedY = (drmX %*% beta).collect(::, 0)
+  (y - fittedY).norm(2)
+}
+</pre></div>
+
+So far we have left out an important aspect of a standard linear regression 
+model. Usually there is a constant bias term added to the model. Without 
+that, our model always crosses through the origin and we only learn the 
+right angle. An easy way to add such a bias term to our model is to add a 
+column of ones to the feature matrix `\(\mathbf{X}\)`. 
+The corresponding weight in the parameter vector will then be the bias term.
+
+Here is how we add a bias column:
+
+<div class="codehilite"><pre>
+val drmXwithBiasColumn = drmX cbind 1
+</pre></div>
+
+Now we can give the newly created DRM ```drmXwithBiasColumn``` to our model 
fitting method ```ols``` and see how well the resulting model fits the training 
data with ```goodnessOfFit```. You should see a large improvement in the result.
+
+<div class="codehilite"><pre>
+val betaWithBiasTerm = ols(drmXwithBiasColumn, y)
+goodnessOfFit(drmXwithBiasColumn, betaWithBiasTerm, y)
+</pre></div>
+
+As a further optimization, we can make use of the DSL's caching functionality. 
We use ```drmXwithBiasColumn``` repeatedly  as input to a computation, so it 
might be beneficial to cache it in memory. This is achieved by calling 
```checkpoint()```. In the end, we remove it from the cache with uncache:
+
+<div class="codehilite"><pre>
+val cachedDrmX = drmXwithBiasColumn.checkpoint()
+
+val betaWithBiasTerm = ols(cachedDrmX, y)
+val goodness = goodnessOfFit(cachedDrmX, betaWithBiasTerm, y)
+
+cachedDrmX.uncache()
+
+goodness
+</pre></div>
+
+
+Liked what you saw? Checkout Mahout's overview for the [Scala and Spark 
bindings](https://mahout.apache.org/users/sparkbindings/home.html).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/mahout/blob/a60c79e7/website/index.html
----------------------------------------------------------------------
diff --git a/website/index.html b/website/index.html
deleted file mode 100644
index 21b2f4b..0000000
--- a/website/index.html
+++ /dev/null
@@ -1,5 +0,0 @@
----
-layout: default
----
-
-<!-- BLANK -->

http://git-wip-us.apache.org/repos/asf/mahout/blob/a60c79e7/website/index.md
----------------------------------------------------------------------
diff --git a/website/index.md b/website/index.md
new file mode 100644
index 0000000..b9e3708
--- /dev/null
+++ b/website/index.md
@@ -0,0 +1,164 @@
+---
+layout: default
+theme: mahout
+---
+
+<div class="jumbotron">
+  <div class="container">
+    <h1>Apache Mahout - DRAFT </h1>
+    <p>A distributed linear algebra framework that runs on Spark, Flink, GPU's 
and more!<br/>
+      Use Mahout's library of machine learning algorithms or roll your own!  
Use Mahout-Samsara to write matrix
+      algebra using R like syntax.  Check out our tutorials and quick start 
guide to get rolling.
+    </p>
+    <div class="border row">
+      <div class="col-md-12 col-sm-12 col-xs-12 text-center newBtn">
+        <a href="http://youtube.com"; target="_zeppelinVideo" class="btn 
btn-primary btn-lg bigFingerButton" role="button">Tutorial Video</a>
+        <a href="https://github.com/apache/mahout"; class="btn btn-primary 
btn-lg bigFingerButton" role="button">GET LATEST MAHOUT</a>
+      </div>
+    </div>
+  </div>
+</div>  
+
+<!-- 3 wide column -->
+
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+
+<div class="new">
+  <div class="container">
+    <h2>Latest Release</h2>
+    <span class="newZeppelin center-block">Apache Mahout 0.13.0</span>
+    <div class="border row">
+      <div class="border col-md-4 col-sm-4">
+        <h4>Simple and <br/>Extensible</h4>
+        <div class="viz">
+          <p>
+            Build your own algorithms using Mahouts R like interface.  See an 
example in this 
+            <a href="" target="_blank">demo</a>
+          </p>
+        </div>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>Support for Multiple <br/>Distributed Backends</h4>
+        <div class="multi">
+        <p>
+           Custom bindings for Spark, Flink, and H20 enable a write once run 
anywhere machine learning platform
+          <a class="thumbnail text-center" href="#thumb">
+            See more in this DEMO.
+            <span><img src="./assets/themes/zeppelin/img/scope.gif" 
style="max-width: 55vw" /></span>
+          </a> 
+        </p>
+        </div>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>Introducing Samsara an R<br/> dsl for writing ML algos</h4>
+        <div class="personal">
+        <p>
+          Use this capability to write algorithms at scale, that will run on 
any backend 
+        </p>
+        </div>
+      </div>
+    </div>
+    <div class="border row">
+      <div class="border col-md-4 col-sm-4">
+        <h4>Support for GPUs</h4>
+        <p>
+          Distributed GPU Matrix-Matrix and Matrix-Vector multiplication on 
Spark along with sparse and dense matrix GPU-backed support.
+        </p>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>Extensible Algorithms Framework</h4>
+        <p>
+           A new scikit-learn-like framework for algorithms with the goal for
+           creating a consistent API for various machine-learning algorithms
+        </p>
+      </div>
+      <div class="border col-md-4 col-sm-4">
+        <h4>0.13.1 - Future Plans</h4>
+        <p>
+          Further Native Integration 
+          * JCuda backing for In-core Matrices and CUDA solvers
+          * GPU/OpenMP Acceleration for linear solvers
+          * Scala 2.11 Support
+          * Spark 2.x Support
+        </p>
+      </div>
+    </div>
+    <div class="col-md-12 col-sm-12 col-xs-12 text-center">
+      <p style="text-align:center; margin-top: 32px; font-size: 14px; color: 
gray; font-weight: 200; font-style: italic; padding-bottom: 0;">See more 
details in 
+        <a href="tbd">0.13.0 Release Note</a>
+      </p>
+    </div>
+  </div>
+</div>
+
+      <!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+
+
+        <div class="container">
+            <div class="row">
+                <div class="col-md-12">
+                
+
+                </div>
+            </div>
+            <div class="row">
+                <div class="col-md-12">
+                    {% for post in paginator.posts %}
+                        {% include tile.html %}
+                    {% endfor %}
+
+
+                    
+                </div>
+            </div>
+        </div>
+
+
+
+<div class="new">
+  <div class="container">
+    <h2>Mahout on Twitter</h2>
+    <br/>
+    <div class="row">
+      <div class="col-md-12 col-sm-12 col-xs-12 text-center">
+        <div class='jekyll-twitter-plugin'><a class="twitter-timeline" 
data-width="500" data-tweet-limit="4" data-chrome="nofooter" 
href="https://twitter.com/ApacheMahout";>Tweets by ApacheMahout</a>
+<script async src="//platform.twitter.com/widgets.js" 
charset="utf-8"></script></div>
+      </div>
+      <div class="col-md-12 col-sm-12 col-xs-12 text-center twitterBtn">
+        <p style="text-align:center; margin-top: 32px; font-size: 12px; color: 
gray; font-weight: 200; font-style: italic; padding-bottom: 0;">See more tweets 
or</p>
+        <a href="https://twitter.com/ApacheMahout"; target="_blank" class="btn 
btn-primary btn-lg round" role="button">
+          Follow Mahout on &nbsp;
+          <i class="fa fa-twitter fa-lg" aria-hidden="true"></i>
+        </a>
+      </div>
+    </div>
+  </div>
+  <hr>
+</div>

Reply via email to