WEBSITE Added Overview to docs
Project: http://git-wip-us.apache.org/repos/asf/mahout/repo Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/c4feca03 Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/c4feca03 Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/c4feca03 Branch: refs/heads/master Commit: c4feca039d93cfa10c074d732511ec3dfa86b69e Parents: fc43340 Author: rawkintrevo <[email protected]> Authored: Thu May 4 12:36:21 2017 -0500 Committer: rawkintrevo <[email protected]> Committed: Thu May 4 12:36:21 2017 -0500 ---------------------------------------------------------------------- website/docs/History.markdown | 16 ----- website/docs/_includes/navbar.html | 1 + website/docs/index.md | 106 +++++++++++++++++++++++++++++++- 3 files changed, 104 insertions(+), 19 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mahout/blob/c4feca03/website/docs/History.markdown ---------------------------------------------------------------------- diff --git a/website/docs/History.markdown b/website/docs/History.markdown deleted file mode 100755 index 5ef89c1..0000000 --- a/website/docs/History.markdown +++ /dev/null @@ -1,16 +0,0 @@ -## HEAD - -### Major Enhancements - -### Minor Enahncements - * Add `drafts` folder support (#167) - * Add `excerpt` support (#168) - * Create History.markdown to help project management (#169) - -### Bug Fixes - -### Site Enhancements - -### Compatibility updates - * Update `preview` task - http://git-wip-us.apache.org/repos/asf/mahout/blob/c4feca03/website/docs/_includes/navbar.html ---------------------------------------------------------------------- diff --git a/website/docs/_includes/navbar.html b/website/docs/_includes/navbar.html index 695cef3..c8a0cf6 100644 --- a/website/docs/_includes/navbar.html +++ b/website/docs/_includes/navbar.html @@ -12,6 +12,7 @@ <li id="dropdown"> <a href="#" class="dropdown-toggle" data-toggle="dropdown" role="button" aria-haspopup="true" aria-expanded="false">Key Concepts<span class="caret"></span></a> <ul class="dropdown-menu"> + <li><a href="{{ BASE_PATH }}/index.html">Mahout Overview</a></li> <li><span><b> Scala DSL</b><span></li> <li><a href="{{ BASE_PATH }}/mahout-samsara/in-core-reference.html">In-core Reference</a></li> <li><a href="{{ BASE_PATH }}/mahout-samsara/out-of-core-reference.html">Out-of-core Reference</a></li> http://git-wip-us.apache.org/repos/asf/mahout/blob/c4feca03/website/docs/index.md ---------------------------------------------------------------------- diff --git a/website/docs/index.md b/website/docs/index.md index 2c4fcd0..9d7f667 100755 --- a/website/docs/index.md +++ b/website/docs/index.md @@ -1,10 +1,110 @@ --- layout: page title: Welcome to the Docs -tagline: Two men enter- one man leaves +tagline: Apache Mahout from 30,000 feet (10,000 meters) --- -This is just a stub. -But it would be nice to have maybe some sort of over view of what's going on. +You've probably already noticed Mahout has a lot of things going on at different levels, and it can be hard to know where +to start. Let's provide an overview to help you see how the pieces fit together. In general the stack is something like this: + +1. Application Code +1. Samsara Scala-DSL (Syntactic Sugar) +1. Logical/Physical DAG +1. Engine Bindings +1. Code runs in Engine +1. Native Solvers + +## Application Code + +You have an JAVA/Scala applicatoin (skip this if you're working from an interactive shell or Apache Zeppelin) + + + def main(args: Array[String]) { + + println("Welcome to My Mahout App") + + if (args.isEmpty) { + +This may seem like a trivial part to call out, but the point is important- Mahout runs _inline_ with your regular application +code. E.g. if this is an Apache Spark app, then you do all your Spark things, including ETL and data prep in the same +application, and then invoke Mahout's mathematically expressive Scala DSL when you're ready to math on it. + +## Samsara Scala-DSL (Syntactic Sugar) + +So when you get to a point in your code where you're ready to math it up (in this example Spark) you can elegently express +yourself mathematically. + + implicit val sdc: org.apache.mahout.sparkbindings.SparkDistributedContext = sc2sdc(sc) + + val A = drmWrap(rddA) + val B = drmWrap(rddB) + + val C = A.t %*% A + A %*% B.t + +We've defined a `MahoutDistributedContext` (which is a wrapper on the Spark Context), and two Disitributed Row Matrices (DRMs) +which are wrappers around RDDs (in Spark). + +## Logical / Physical DAG + +At this point there is a bit of optimization that happens. For example, consider the + + A.t %*% A + +Which is +<center>\(\mathbf{A^\intercal A}\)</center> + +Transposing a large matrix is a very expensive thing to do, and in this case we don't actually need to do it. There is a +more efficient way to calculate <foo>\(\mathbf{A^\intercal A}\)</foo> that doesn't require a physical transpose. + +(Image showing this) + +Mahout converts this code into something that looks like: + + OpAtA(A) + OpABt(A, B) // illustrative pseudocode with real functions called + +There's a little more magic that happens at this level, but the punchline is _Mahout translates the pretty scala into a +a series of operators, which at the next level are turned implemented at the engine_. + +## Engine Bindings and Engine Level Ops + +When one creates new engine bindings, one is in essence defining + +1. What the engine specific underlying structure for a DRM is (in Spark its an RDD). The underlying structure also has +rows of `MahoutVector`s, so in Spark `RDD[(index, MahoutVector)]`. This will be important when we get to the native solvers. +1. Implementing a set of BLAS (basic linear algebra) functions for working on the underlying structure- in Spark this means +implementing things like `AtA` on an RDD. See [the sparkbindings on github](https://github.com/apache/mahout/tree/master/spark/src/main/scala/org/apache/mahout/sparkbindings) + +Now your mathematically expresive Samsara Scala code has been translated into optimized engine specific functions. + +## Native Solvers + +Recall how I said the rows of the DRMs are `org.apache.mahout.math.Vector`. Here is where this becomes important. I'm going +to explain this in the context of Spark, but the principals apply to all distributed backends. + +If you are familiar with how mapping and reducing in Spark, then envision this RDD of `MahoutVector`s, each partition, +and indexed collection of vectors is a _block_ of the distributed matrix, however this _block_ is totally incore, and therefor +is treated like an in core matrix. + +Now Mahout defines its own incore BLAS packs and refers to them as _Native Solvers_. The default native solver is just plain +old JVM, which is painfully slow, but works just about anywhere. + +When the data gets to the node and an operation on the matrix block is called. In the same way Mahout converts abstract +operators on the DRM that are implemented on various distributed engines, it calls abstract operators on the incore matrix +and vectors which are implemented on various native solvers. + +The default "native solver" is the JVM, which isn't native at all- and if no actual native solvers are present operations +will fall back to this. However, IF a native solver is present (the jar was added to the notebook), then the magic will happen. + +Imagine still we have our Spark executor- it has this block of a matrix sitting in its core. Now let's suppose the `ViennaCl-OMP` +native solver is in use. When Spark calls an operation on this incore matrix, the matrix dumps out of the JVM and the +calculation is carried out on _all available CPUs_. + +In a similar way, the `ViennaCL` native solver dumps the matrix out of the JVM and looks for a GPU to execute the operations on. + +Once the operations are complete, the result is loaded back up into the JVM, and Spark (or whatever distributed engine) and +shipped back to the driver. + +The native solver operatoins are only defined on `org.apache.mahout.math.Vector` and `org.apache.mahout.math.Matrix`, which is +why it is critical that the underlying structure composed row-wise of `Vector` or `Matrices`.
