WEBSITE Added Overview to docs

Project: http://git-wip-us.apache.org/repos/asf/mahout/repo
Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/c4feca03
Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/c4feca03
Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/c4feca03

Branch: refs/heads/master
Commit: c4feca039d93cfa10c074d732511ec3dfa86b69e
Parents: fc43340
Author: rawkintrevo <[email protected]>
Authored: Thu May 4 12:36:21 2017 -0500
Committer: rawkintrevo <[email protected]>
Committed: Thu May 4 12:36:21 2017 -0500

----------------------------------------------------------------------
 website/docs/History.markdown      |  16 -----
 website/docs/_includes/navbar.html |   1 +
 website/docs/index.md              | 106 +++++++++++++++++++++++++++++++-
 3 files changed, 104 insertions(+), 19 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mahout/blob/c4feca03/website/docs/History.markdown
----------------------------------------------------------------------
diff --git a/website/docs/History.markdown b/website/docs/History.markdown
deleted file mode 100755
index 5ef89c1..0000000
--- a/website/docs/History.markdown
+++ /dev/null
@@ -1,16 +0,0 @@
-## HEAD
-
-### Major Enhancements
-
-### Minor Enahncements
-  * Add `drafts` folder support (#167)
-  * Add `excerpt` support (#168)
-  * Create History.markdown to help project management (#169)
-
-### Bug Fixes
-
-### Site Enhancements
-
-### Compatibility updates
-  * Update `preview` task
-

http://git-wip-us.apache.org/repos/asf/mahout/blob/c4feca03/website/docs/_includes/navbar.html
----------------------------------------------------------------------
diff --git a/website/docs/_includes/navbar.html 
b/website/docs/_includes/navbar.html
index 695cef3..c8a0cf6 100644
--- a/website/docs/_includes/navbar.html
+++ b/website/docs/_includes/navbar.html
@@ -12,6 +12,7 @@
         <li id="dropdown">
             <a href="#" class="dropdown-toggle" data-toggle="dropdown" 
role="button" aria-haspopup="true" aria-expanded="false">Key Concepts<span 
class="caret"></span></a>
             <ul class="dropdown-menu">
+                <li><a href="{{ BASE_PATH }}/index.html">Mahout 
Overview</a></li>
                 <li><span><b>&nbsp;&nbsp;Scala DSL</b><span></li>
                 <li><a href="{{ BASE_PATH 
}}/mahout-samsara/in-core-reference.html">In-core Reference</a></li>
                 <li><a href="{{ BASE_PATH 
}}/mahout-samsara/out-of-core-reference.html">Out-of-core Reference</a></li>

http://git-wip-us.apache.org/repos/asf/mahout/blob/c4feca03/website/docs/index.md
----------------------------------------------------------------------
diff --git a/website/docs/index.md b/website/docs/index.md
index 2c4fcd0..9d7f667 100755
--- a/website/docs/index.md
+++ b/website/docs/index.md
@@ -1,10 +1,110 @@
 ---
 layout: page
 title: Welcome to the Docs
-tagline: Two men enter- one man leaves
+tagline: Apache Mahout from 30,000 feet (10,000 meters)
 ---
 
-This is just a stub. 
 
-But it would be nice to have maybe some sort of over view of what's going on.
+You've probably already noticed Mahout has a lot of things going on at 
different levels, and it can be hard to know where
+to start.  Let's provide an overview to help you see how the pieces fit 
together. In general the stack is something like this:
+
+1. Application Code
+1. Samsara Scala-DSL (Syntactic Sugar)
+1. Logical/Physical DAG
+1. Engine Bindings
+1. Code runs in Engine
+1. Native Solvers 
+
+## Application Code
+
+You have an JAVA/Scala applicatoin (skip this if you're working from an 
interactive shell or Apache Zeppelin)
+
+    
+    def main(args: Array[String]) {
+
+      println("Welcome to My Mahout App")
+
+      if (args.isEmpty) {
+
+This may seem like a trivial part to call out, but the point is important- 
Mahout runs _inline_ with your regular application 
+code. E.g. if this is an Apache Spark app, then you do all your Spark things, 
including ETL and data prep in the same 
+application, and then invoke Mahout's mathematically expressive Scala DSL when 
you're ready to math on it.
+
+## Samsara Scala-DSL (Syntactic Sugar)
+
+So when you get to a point in your code where you're ready to math it up (in 
this example Spark) you can elegently express 
+yourself mathematically.
+
+    implicit val sdc: org.apache.mahout.sparkbindings.SparkDistributedContext 
= sc2sdc(sc)
+    
+    val A = drmWrap(rddA)
+    val B = drmWrap(rddB) 
+    
+    val C = A.t %*% A + A %*% B.t
+    
+We've defined a `MahoutDistributedContext` (which is a wrapper on the Spark 
Context), and two Disitributed Row Matrices (DRMs)
+which are wrappers around RDDs (in Spark).  
+
+## Logical / Physical DAG
+
+At this point there is a bit of optimization that happens.  For example, 
consider the
+    
+    A.t %*% A
+    
+Which is 
+<center>\(\mathbf{A^\intercal A}\)</center>
+
+Transposing a large matrix is a very expensive thing to do, and in this case 
we don't actually need to do it. There is a
+more efficient way to calculate <foo>\(\mathbf{A^\intercal A}\)</foo> that 
doesn't require a physical transpose. 
+
+(Image showing this)
+
+Mahout converts this code into something that looks like:
+
+    OpAtA(A) + OpABt(A, B) //  illustrative pseudocode with real functions 
called
+
+There's a little more magic that happens at this level, but the punchline is 
_Mahout translates the pretty scala into a
+a series of operators, which at the next level are turned implemented at the 
engine_.
+
+## Engine Bindings and Engine Level Ops
+
+When one creates new engine bindings, one is in essence defining
+
+1. What the engine specific underlying structure for a DRM is (in Spark its an 
RDD).  The underlying structure also has 
+rows of `MahoutVector`s, so in Spark `RDD[(index, MahoutVector)]`.  This will 
be important when we get to the native solvers. 
+1. Implementing a set of BLAS (basic linear algebra) functions for working on 
the underlying structure- in Spark this means
+implementing things like `AtA` on an RDD. See [the sparkbindings on 
github](https://github.com/apache/mahout/tree/master/spark/src/main/scala/org/apache/mahout/sparkbindings)
+
+Now your mathematically expresive Samsara Scala code has been translated into 
optimized engine specific functions.
+
+## Native Solvers
+
+Recall how I said the rows of the DRMs are `org.apache.mahout.math.Vector`.  
Here is where this becomes important. I'm going 
+to explain this in the context of Spark, but the principals apply to all 
distributed backends. 
+
+If you are familiar with how mapping and reducing in Spark, then envision this 
RDD of `MahoutVector`s,  each partition, 
+and indexed collection of vectors is a _block_ of the distributed matrix, 
however this _block_ is totally incore, and therefor
+is treated like an in core matrix. 
+
+Now Mahout defines its own incore BLAS packs and refers to them as _Native 
Solvers_.  The default native solver is just plain
+old JVM, which is painfully slow, but works just about anywhere.  
+
+When the data gets to the node and an operation on the matrix block is called. 
 In the same way Mahout converts abstract
+operators on the DRM that are implemented on various distributed engines, it 
calls abstract operators on the incore matrix 
+and vectors which are implemented on various native solvers. 
+
+The default "native solver" is the JVM, which isn't native at all- and if no 
actual native solvers are present operations 
+will fall back to this. However, IF a native solver is present (the jar was 
added to the notebook), then the magic will happen.
+
+Imagine still we have our Spark executor- it has this block of a matrix 
sitting in its core. Now let's suppose the `ViennaCl-OMP`
+native solver is in use.  When Spark calls an operation on this incore matrix, 
the matrix dumps out of the JVM and the 
+calculation is carried out on _all available CPUs_. 
+
+In a similar way, the `ViennaCL` native solver dumps the matrix out of the JVM 
and looks for a GPU to execute the operations on.
+ 
+Once the operations are complete, the result is loaded back up into the JVM, 
and Spark (or whatever distributed engine) and 
+shipped back to the driver. 
+
+The native solver operatoins are only defined on 
`org.apache.mahout.math.Vector` and `org.apache.mahout.math.Matrix`, which is 
+why it is critical that the underlying structure composed row-wise of `Vector` 
or `Matrices`. 
 

Reply via email to