Repository: mahout
Updated Branches:
  refs/heads/asf-site be7bef00c -> bdaf56d26


replace \( and \) with 22996 in /latest dir (double dollarsigns)


Project: http://git-wip-us.apache.org/repos/asf/mahout/repo
Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/bdaf56d2
Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/bdaf56d2
Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/bdaf56d2

Branch: refs/heads/asf-site
Commit: bdaf56d2692396ab13791e92ed617b7eb5c5fcd3
Parents: be7bef0
Author: Andrew Palumbo <[email protected]>
Authored: Sun Dec 24 14:12:10 2017 -0800
Committer: Andrew Palumbo <[email protected]>
Committed: Sun Dec 24 14:13:12 2017 -0800

----------------------------------------------------------------------
 .../algorithms/linear-algebra/d-spca.html       | 42 ++++++++---------
 .../algorithms/linear-algebra/d-ssvd.html       | 48 ++++++++++----------
 .../map-reduce/classification/bayesian.html     | 34 +++++++-------
 .../clustering/spectral-clustering.html         | 12 ++---
 docs/latest/algorithms/reccomenders/d-als.html  |  2 +-
 docs/latest/algorithms/regression/ols.html      |  4 +-
 docs/latest/index.html                          |  4 +-
 .../mahout-samsara/in-core-reference.html       | 14 +++---
 .../mahout-samsara/out-of-core-reference.html   |  6 +--
 .../tutorials/samsara/play-with-shell.html      | 38 ++++++++--------
 .../tutorials/samsara/spark-naive-bayes.html    | 34 +++++++-------
 11 files changed, 119 insertions(+), 119 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/linear-algebra/d-spca.html
----------------------------------------------------------------------
diff --git a/docs/latest/algorithms/linear-algebra/d-spca.html 
b/docs/latest/algorithms/linear-algebra/d-spca.html
index 9809d7d..5489626 100644
--- a/docs/latest/algorithms/linear-algebra/d-spca.html
+++ b/docs/latest/algorithms/linear-algebra/d-spca.html
@@ -167,36 +167,36 @@
     <div class="col-lg-8">
       <h2 id="intro">Intro</h2>
 
-<p>Mahout has a distributed implementation of Stochastic PCA<a href="Lyubimov 
and Palumbo, [&quot;Apache Mahout: Beyond MapReduce; Distributed Algorithm 
Design&quot;](https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785)">1</a>.
 This algorithm computes the exact equivalent of Mahout’s dssvd(<code 
class="highlighter-rouge">\(\mathbf{A-1\mu^\top}\)</code>) by modifying the 
<code class="highlighter-rouge">dssvd</code> algorithm so as to avoid forming 
<code class="highlighter-rouge">\(\mathbf{A-1\mu^\top}\)</code>, which would 
densify a sparse input. Thus, it is suitable for work with both dense and 
sparse inputs.</p>
+<p>Mahout has a distributed implementation of Stochastic PCA<a href="Lyubimov 
and Palumbo, [&quot;Apache Mahout: Beyond MapReduce; Distributed Algorithm 
Design&quot;](https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785)">1</a>.
 This algorithm computes the exact equivalent of Mahout’s dssvd(<code 
class="highlighter-rouge">$$\mathbf{A-1\mu^\top}$$</code>) by modifying the 
<code class="highlighter-rouge">dssvd</code> algorithm so as to avoid forming 
<code class="highlighter-rouge">$$\mathbf{A-1\mu^\top}$$</code>, which would 
densify a sparse input. Thus, it is suitable for work with both dense and 
sparse inputs.</p>
 
 <h2 id="algorithm">Algorithm</h2>
 
-<p>Given an <em>m</em> <code class="highlighter-rouge">\(\times\)</code> 
<em>n</em> matrix <code class="highlighter-rouge">\(\mathbf{A}\)</code>, a 
target rank <em>k</em>, and an oversampling parameter <em>p</em>, this 
procedure computes a <em>k</em>-rank PCA by finding the unknowns in <code 
class="highlighter-rouge">\(\mathbf{A−1\mu^\top \approx U\Sigma 
V^\top}\)</code>:</p>
+<p>Given an <em>m</em> <code class="highlighter-rouge">$$\times$$</code> 
<em>n</em> matrix <code class="highlighter-rouge">$$\mathbf{A}$$</code>, a 
target rank <em>k</em>, and an oversampling parameter <em>p</em>, this 
procedure computes a <em>k</em>-rank PCA by finding the unknowns in <code 
class="highlighter-rouge">$$\mathbf{A−1\mu^\top \approx U\Sigma 
V^\top}$$</code>:</p>
 
 <ol>
-  <li>Create seed for random <em>n</em> <code 
class="highlighter-rouge">\(\times\)</code> <em>(k+p)</em> matrix <code 
class="highlighter-rouge">\(\Omega\)</code>.</li>
-  <li><code class="highlighter-rouge">\(\mathbf{s_\Omega \leftarrow 
\Omega^\top \mu}\)</code>.</li>
-  <li><code class="highlighter-rouge">\(\mathbf{Y_0 \leftarrow A\Omega − 1 
{s_\Omega}^\top, Y \in \mathbb{R}^{m\times(k+p)}}\)</code>.</li>
-  <li>Column-orthonormalize <code class="highlighter-rouge">\(\mathbf{Y_0} 
\rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code 
class="highlighter-rouge">\(\mathbf{Y_0} = \mathbf{QR}\)</code>. Also, <code 
class="highlighter-rouge">\(\mathbf{Q}\in\mathbb{R}^{m\times(k+p)}, 
\mathbf{R}\in\mathbb{R}^{(k+p)\times(k+p)}\)</code>.</li>
-  <li><code class="highlighter-rouge">\(\mathbf{s_Q \leftarrow Q^\top 
1}\)</code>.</li>
-  <li><code class="highlighter-rouge">\(\mathbf{B_0 \leftarrow Q^\top A: B \in 
\mathbb{R}^{(k+p)\times n}}\)</code>.</li>
-  <li><code class="highlighter-rouge">\(\mathbf{s_B \leftarrow {B_0}^\top 
\mu}\)</code>.</li>
+  <li>Create seed for random <em>n</em> <code 
class="highlighter-rouge">$$\times$$</code> <em>(k+p)</em> matrix <code 
class="highlighter-rouge">$$\Omega$$</code>.</li>
+  <li><code class="highlighter-rouge">$$\mathbf{s_\Omega \leftarrow 
\Omega^\top \mu}$$</code>.</li>
+  <li><code class="highlighter-rouge">$$\mathbf{Y_0 \leftarrow A\Omega − 1 
{s_\Omega}^\top, Y \in \mathbb{R}^{m\times(k+p)}}$$</code>.</li>
+  <li>Column-orthonormalize <code class="highlighter-rouge">$$\mathbf{Y_0} 
\rightarrow \mathbf{Q}$$</code> by computing thin decomposition <code 
class="highlighter-rouge">$$\mathbf{Y_0} = \mathbf{QR}$$</code>. Also, <code 
class="highlighter-rouge">$$\mathbf{Q}\in\mathbb{R}^{m\times(k+p)}, 
\mathbf{R}\in\mathbb{R}^{(k+p)\times(k+p)}$$</code>.</li>
+  <li><code class="highlighter-rouge">$$\mathbf{s_Q \leftarrow Q^\top 
1}$$</code>.</li>
+  <li><code class="highlighter-rouge">$$\mathbf{B_0 \leftarrow Q^\top A: B \in 
\mathbb{R}^{(k+p)\times n}}$$</code>.</li>
+  <li><code class="highlighter-rouge">$$\mathbf{s_B \leftarrow {B_0}^\top 
\mu}$$</code>.</li>
   <li>For <em>i</em> in 1..<em>q</em> repeat (power iterations):
     <ul>
-      <li>For <em>j</em> in 1..<em>n</em> apply <code 
class="highlighter-rouge">\(\mathbf{(B_{i−1})_{∗j} \leftarrow 
(B_{i−1})_{∗j}−\mu_j s_Q}\)</code>.</li>
-      <li><code class="highlighter-rouge">\(\mathbf{Y_i \leftarrow 
A{B_{i−1}}^\top−1(s_B−\mu^\top \mu s_Q)^\top}\)</code>.</li>
-      <li>Column-orthonormalize <code class="highlighter-rouge">\(\mathbf{Y_i} 
\rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code 
class="highlighter-rouge">\(\mathbf{Y_i = QR}\)</code>.</li>
-      <li><code class="highlighter-rouge">\(\mathbf{s_Q \leftarrow Q^\top 
1}\)</code>.</li>
-      <li><code class="highlighter-rouge">\(\mathbf{B_i \leftarrow Q^\top 
A}\)</code>.</li>
-      <li><code class="highlighter-rouge">\(\mathbf{s_B \leftarrow {B_i}^\top 
\mu}\)</code>.</li>
+      <li>For <em>j</em> in 1..<em>n</em> apply <code 
class="highlighter-rouge">$$\mathbf{(B_{i−1})_{∗j} \leftarrow 
(B_{i−1})_{∗j}−\mu_j s_Q}$$</code>.</li>
+      <li><code class="highlighter-rouge">$$\mathbf{Y_i \leftarrow 
A{B_{i−1}}^\top−1(s_B−\mu^\top \mu s_Q)^\top}$$</code>.</li>
+      <li>Column-orthonormalize <code class="highlighter-rouge">$$\mathbf{Y_i} 
\rightarrow \mathbf{Q}$$</code> by computing thin decomposition <code 
class="highlighter-rouge">$$\mathbf{Y_i = QR}$$</code>.</li>
+      <li><code class="highlighter-rouge">$$\mathbf{s_Q \leftarrow Q^\top 
1}$$</code>.</li>
+      <li><code class="highlighter-rouge">$$\mathbf{B_i \leftarrow Q^\top 
A}$$</code>.</li>
+      <li><code class="highlighter-rouge">$$\mathbf{s_B \leftarrow {B_i}^\top 
\mu}$$</code>.</li>
     </ul>
   </li>
-  <li>Let <code class="highlighter-rouge">\(\mathbf{C \triangleq s_Q 
{s_B}^\top}\)</code>. <code class="highlighter-rouge">\(\mathbf{M \leftarrow 
B_q {B_q}^\top − C − C^\top + \mu^\top \mu s_Q {s_Q}^\top}\)</code>.</li>
-  <li>Compute an eigensolution of the small symmetric <code 
class="highlighter-rouge">\(\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in 
\mathbb{R}^{(k+p)\times(k+p)}}\)</code>.</li>
-  <li>The singular values <code class="highlighter-rouge">\(\Sigma = 
\Lambda^{\circ 0.5}\)</code>, or, in other words, <code 
class="highlighter-rouge">\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)</code>.</li>
-  <li>If needed, compute <code class="highlighter-rouge">\(\mathbf{U = 
Q\hat{U}}\)</code>.</li>
-  <li>If needed, compute <code class="highlighter-rouge">\(\mathbf{V = B^\top 
\hat{U} \Sigma^{−1}}\)</code>.</li>
-  <li>If needed, items converted to the PCA space can be computed as <code 
class="highlighter-rouge">\(\mathbf{U\Sigma}\)</code>.</li>
+  <li>Let <code class="highlighter-rouge">$$\mathbf{C \triangleq s_Q 
{s_B}^\top}$$</code>. <code class="highlighter-rouge">$$\mathbf{M \leftarrow 
B_q {B_q}^\top − C − C^\top + \mu^\top \mu s_Q {s_Q}^\top}$$</code>.</li>
+  <li>Compute an eigensolution of the small symmetric <code 
class="highlighter-rouge">$$\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in 
\mathbb{R}^{(k+p)\times(k+p)}}$$</code>.</li>
+  <li>The singular values <code class="highlighter-rouge">$$\Sigma = 
\Lambda^{\circ 0.5}$$</code>, or, in other words, <code 
class="highlighter-rouge">$$\mathbf{\sigma_i= \sqrt{\lambda_i}}$$</code>.</li>
+  <li>If needed, compute <code class="highlighter-rouge">$$\mathbf{U = 
Q\hat{U}}$$</code>.</li>
+  <li>If needed, compute <code class="highlighter-rouge">$$\mathbf{V = B^\top 
\hat{U} \Sigma^{−1}}$$</code>.</li>
+  <li>If needed, items converted to the PCA space can be computed as <code 
class="highlighter-rouge">$$\mathbf{U\Sigma}$$</code>.</li>
 </ol>
 
 <h2 id="implementation">Implementation</h2>

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/linear-algebra/d-ssvd.html
----------------------------------------------------------------------
diff --git a/docs/latest/algorithms/linear-algebra/d-ssvd.html 
b/docs/latest/algorithms/linear-algebra/d-ssvd.html
index 3f16ca0..ac339ca 100644
--- a/docs/latest/algorithms/linear-algebra/d-ssvd.html
+++ b/docs/latest/algorithms/linear-algebra/d-ssvd.html
@@ -171,50 +171,50 @@
 
 <h2 id="modified-ssvd-algorithm">Modified SSVD Algorithm</h2>
 
-<p>Given an <code class="highlighter-rouge">\(m\times n\)</code>
-matrix <code class="highlighter-rouge">\(\mathbf{A}\)</code>, a target rank 
<code class="highlighter-rouge">\(k\in\mathbb{N}_{1}\)</code>
-, an oversampling parameter <code 
class="highlighter-rouge">\(p\in\mathbb{N}_{1}\)</code>, 
-and the number of additional power iterations <code 
class="highlighter-rouge">\(q\in\mathbb{N}_{0}\)</code>, 
-this procedure computes an <code 
class="highlighter-rouge">\(m\times\left(k+p\right)\)</code>
-SVD <code class="highlighter-rouge">\(\mathbf{A\approx 
U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)</code>:</p>
+<p>Given an <code class="highlighter-rouge">$$m\times n$$</code>
+matrix <code class="highlighter-rouge">$$\mathbf{A}$$</code>, a target rank 
<code class="highlighter-rouge">$$k\in\mathbb{N}_{1}$$</code>
+, an oversampling parameter <code 
class="highlighter-rouge">$$p\in\mathbb{N}_{1}$$</code>,
+and the number of additional power iterations <code 
class="highlighter-rouge">$$q\in\mathbb{N}_{0}$$</code>,
+this procedure computes an <code 
class="highlighter-rouge">$$m\times\left(k+p\right)$$</code>
+SVD <code class="highlighter-rouge">$$\mathbf{A\approx 
U}\boldsymbol{\Sigma}\mathbf{V}^{\top}$$</code>:</p>
 
 <ol>
   <li>
-    <p>Create seed for random <code 
class="highlighter-rouge">\(n\times\left(k+p\right)\)</code>
-  matrix <code class="highlighter-rouge">\(\boldsymbol{\Omega}\)</code>. The 
seed defines matrix <code class="highlighter-rouge">\(\mathbf{\Omega}\)</code>
+    <p>Create seed for random <code 
class="highlighter-rouge">$$n\times\left(k+p\right)$$</code>
+  matrix <code class="highlighter-rouge">$$\boldsymbol{\Omega}$$</code>. The 
seed defines matrix <code class="highlighter-rouge">$$\mathbf{\Omega}$$</code>
   using Gaussian unit vectors per one of suggestions in [Halko, Martinsson, 
Tropp].</p>
   </li>
   <li>
-    <p><code 
class="highlighter-rouge">\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}\)</code></p>
+    <p><code 
class="highlighter-rouge">$$\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}$$</code></p>
   </li>
   <li>
-    <p>Column-orthonormalize <code 
class="highlighter-rouge">\(\mathbf{Y}\rightarrow\mathbf{Q}\)</code>
-  by computing thin decomposition <code 
class="highlighter-rouge">\(\mathbf{Y}=\mathbf{Q}\mathbf{R}\)</code>.
-  Also, <code 
class="highlighter-rouge">\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)</code>;
 denoted as <code 
class="highlighter-rouge">\(\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\)</code></p>
+    <p>Column-orthonormalize <code 
class="highlighter-rouge">$$\mathbf{Y}\rightarrow\mathbf{Q}$$</code>
+  by computing thin decomposition <code 
class="highlighter-rouge">$$\mathbf{Y}=\mathbf{Q}\mathbf{R}$$</code>.
+  Also, <code 
class="highlighter-rouge">$$\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}$$</code>;
 denoted as <code 
class="highlighter-rouge">$$\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}$$</code></p>
   </li>
   <li>
-    <p><code 
class="highlighter-rouge">\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times
 n}\)</code>.</p>
+    <p><code 
class="highlighter-rouge">$$\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times
 n}$$</code>.</p>
   </li>
   <li>
-    <p>If <code class="highlighter-rouge">\(q&gt;0\)</code>
-  repeat: for <code class="highlighter-rouge">\(i=1..q\)</code>: 
-  <code 
class="highlighter-rouge">\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)</code>
+    <p>If <code class="highlighter-rouge">$$q&gt;0$$</code>
+  repeat: for <code class="highlighter-rouge">$$i=1..q$$</code>:
+  <code 
class="highlighter-rouge">$$\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}$$</code>
   (power iterations step).</p>
   </li>
   <li>
-    <p>Compute Eigensolution of a small Hermitian <code 
class="highlighter-rouge">\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)</code>,
-  <code 
class="highlighter-rouge">\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)</code>.</p>
+    <p>Compute Eigensolution of a small Hermitian <code 
class="highlighter-rouge">$$\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}$$</code>,
+  <code 
class="highlighter-rouge">$$\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}$$</code>.</p>
   </li>
   <li>
-    <p>Singular values <code 
class="highlighter-rouge">\(\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}\)</code>,
-  or, in other words, <code 
class="highlighter-rouge">\(s_{i}=\sqrt{\sigma_{i}}\)</code>.</p>
+    <p>Singular values <code 
class="highlighter-rouge">$$\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}$$</code>,
+  or, in other words, <code 
class="highlighter-rouge">$$s_{i}=\sqrt{\sigma_{i}}$$</code>.</p>
   </li>
   <li>
-    <p>If needed, compute <code 
class="highlighter-rouge">\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)</code>.</p>
+    <p>If needed, compute <code 
class="highlighter-rouge">$$\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}$$</code>.</p>
   </li>
   <li>
-    <p>If needed, compute <code 
class="highlighter-rouge">\(\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)</code>.
-Another way is <code 
class="highlighter-rouge">\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}\)</code>.</p>
+    <p>If needed, compute <code 
class="highlighter-rouge">$$\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}$$</code>.
+Another way is <code 
class="highlighter-rouge">$$\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}$$</code>.</p>
   </li>
 </ol>
 
@@ -281,7 +281,7 @@ Another way is <code 
class="highlighter-rouge">\(\mathbf{V}=\mathbf{A}^{\top}\ma
 </code></pre>
 </div>
 
-<p>Note: As a side effect of checkpointing, U and V values are returned as 
logical operators (i.e. they are neither checkpointed nor computed).  Therefore 
there is no physical work actually done to compute <code 
class="highlighter-rouge">\(\mathbf{U}\)</code> or <code 
class="highlighter-rouge">\(\mathbf{V}\)</code> until they are used in a 
subsequent expression.</p>
+<p>Note: As a side effect of checkpointing, U and V values are returned as 
logical operators (i.e. they are neither checkpointed nor computed).  Therefore 
there is no physical work actually done to compute <code 
class="highlighter-rouge">$$\mathbf{U}$$</code> or <code 
class="highlighter-rouge">$$\mathbf{V}$$</code> until they are used in a 
subsequent expression.</p>
 
 <h2 id="usage">Usage</h2>
 

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/map-reduce/classification/bayesian.html
----------------------------------------------------------------------
diff --git a/docs/latest/algorithms/map-reduce/classification/bayesian.html 
b/docs/latest/algorithms/map-reduce/classification/bayesian.html
index 9c70058..5d11c37 100644
--- a/docs/latest/algorithms/map-reduce/classification/bayesian.html
+++ b/docs/latest/algorithms/map-reduce/classification/bayesian.html
@@ -181,38 +181,38 @@
 <p>As described in <a 
href="http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf";>[1]</a> Mahout 
Naive Bayes is broken down into the following steps (assignments are over all 
possible index values):</p>
 
 <ul>
-  <li>Let <code 
class="highlighter-rouge">\(\vec{d}=(\vec{d_1},...,\vec{d_n})\)</code> be a set 
of documents; <code class="highlighter-rouge">\(d_{ij}\)</code> is the count of 
word <code class="highlighter-rouge">\(i\)</code> in document <code 
class="highlighter-rouge">\(j\)</code>.</li>
-  <li>Let <code class="highlighter-rouge">\(\vec{y}=(y_1,...,y_n)\)</code> be 
their labels.</li>
-  <li>Let <code class="highlighter-rouge">\(\alpha_i\)</code> be a smoothing 
parameter for all words in the vocabulary; let <code 
class="highlighter-rouge">\(\alpha=\sum_i{\alpha_i}\)</code>.</li>
-  <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and 
L2 length normalization of <code class="highlighter-rouge">\(\vec{d}\)</code>
+  <li>Let <code 
class="highlighter-rouge">$$\vec{d}=(\vec{d_1},...,\vec{d_n})$$</code> be a set 
of documents; <code class="highlighter-rouge">$$d_{ij}$$</code> is the count of 
word <code class="highlighter-rouge">$$i$$</code> in document <code 
class="highlighter-rouge">$$j$$</code>.</li>
+  <li>Let <code class="highlighter-rouge">$$\vec{y}=(y_1,...,y_n)$$</code> be 
their labels.</li>
+  <li>Let <code class="highlighter-rouge">$$\alpha_i$$</code> be a smoothing 
parameter for all words in the vocabulary; let <code 
class="highlighter-rouge">$$\alpha=\sum_i{\alpha_i}$$</code>.</li>
+  <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and 
L2 length normalization of <code class="highlighter-rouge">$$\vec{d}$$</code>
     <ol>
-      <li><code class="highlighter-rouge">\(d_{ij} = 
\sqrt{d_{ij}}\)</code></li>
-      <li><code class="highlighter-rouge">\(d_{ij} = 
d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)\)</code></li>
-      <li><code class="highlighter-rouge">\(d_{ij} 
=\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}\)</code></li>
+      <li><code class="highlighter-rouge">$$d_{ij} = 
\sqrt{d_{ij}}$$</code></li>
+      <li><code class="highlighter-rouge">$$d_{ij} = 
d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)$$</code></li>
+      <li><code class="highlighter-rouge">$$d_{ij} 
=\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}$$</code></li>
     </ol>
   </li>
-  <li><strong>Training: Bayes</strong><code 
class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights 
<code class="highlighter-rouge">\(w_{ci}\)</code> as:
+  <li><strong>Training: Bayes</strong><code 
class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights 
<code class="highlighter-rouge">$$w_{ci}$$</code> as:
     <ol>
-      <li><code 
class="highlighter-rouge">\(\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}\)</code></li>
-      <li><code 
class="highlighter-rouge">\(w_{ci}=\log{\hat\theta_{ci}}\)</code></li>
+      <li><code 
class="highlighter-rouge">$$\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}$$</code></li>
+      <li><code 
class="highlighter-rouge">$$w_{ci}=\log{\hat\theta_{ci}}$$</code></li>
     </ol>
   </li>
-  <li><strong>Training: CBayes</strong><code 
class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights 
<code class="highlighter-rouge">\(w_{ci}\)</code> as:
+  <li><strong>Training: CBayes</strong><code 
class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights 
<code class="highlighter-rouge">$$w_{ci}$$</code> as:
     <ol>
-      <li><code class="highlighter-rouge">\(\hat\theta_{ci} = 
\frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq 
c}{\sum_k{d_{kj}}}+\alpha}\)</code></li>
-      <li><code 
class="highlighter-rouge">\(w_{ci}=-\log{\hat\theta_{ci}}\)</code></li>
-      <li><code class="highlighter-rouge">\(w_{ci}=\frac{w_{ci}}{\sum_i \lvert 
w_{ci}\rvert}\)</code></li>
+      <li><code class="highlighter-rouge">$$\hat\theta_{ci} = 
\frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq 
c}{\sum_k{d_{kj}}}+\alpha}$$</code></li>
+      <li><code 
class="highlighter-rouge">$$w_{ci}=-\log{\hat\theta_{ci}}$$</code></li>
+      <li><code class="highlighter-rouge">$$w_{ci}=\frac{w_{ci}}{\sum_i \lvert 
w_{ci}\rvert}$$</code></li>
     </ol>
   </li>
   <li><strong>Label Assignment/Testing:</strong>
     <ol>
-      <li>Let <code class="highlighter-rouge">\(\vec{t}= 
(t_1,...,t_n)\)</code> be a test document; let <code 
class="highlighter-rouge">\(t_i\)</code> be the count of the word <code 
class="highlighter-rouge">\(t\)</code>.</li>
-      <li>Label the document according to <code 
class="highlighter-rouge">\(l(t)=\arg\max_c \sum\limits_{i} t_i 
w_{ci}\)</code></li>
+      <li>Let <code class="highlighter-rouge">$$\vec{t}= 
(t_1,...,t_n)$$</code> be a test document; let <code 
class="highlighter-rouge">$$t_i$$</code> be the count of the word <code 
class="highlighter-rouge">$$t$$</code>.</li>
+      <li>Label the document according to <code 
class="highlighter-rouge">$$l(t)=\arg\max_c \sum\limits_{i} t_i 
w_{ci}$$</code></li>
     </ol>
   </li>
 </ul>
 
-<p>As we can see, the main difference between Bayes and CBayes is the weight 
calculation step.  Where Bayes weighs terms more heavily based on the 
likelihood that they belong to class <code 
class="highlighter-rouge">\(c\)</code>, CBayes seeks to maximize term weights 
on the likelihood that they do not belong to any other class.</p>
+<p>As we can see, the main difference between Bayes and CBayes is the weight 
calculation step.  Where Bayes weighs terms more heavily based on the 
likelihood that they belong to class <code 
class="highlighter-rouge">$$c$$</code>, CBayes seeks to maximize term weights 
on the likelihood that they do not belong to any other class.</p>
 
 <h2 id="running-from-the-command-line">Running from the command line</h2>
 

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html
----------------------------------------------------------------------
diff --git 
a/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html 
b/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html
index 0a808c2..697b563 100644
--- a/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html
+++ b/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html
@@ -173,16 +173,16 @@
 
 <ol>
   <li>
-    <p>Computing a similarity (or <em>affinity</em>) matrix <code 
class="highlighter-rouge">\(\mathbf{A}\)</code> from the data. This involves 
determining a pairwise distance function <code 
class="highlighter-rouge">\(f\)</code> that takes a pair of data points and 
returns a scalar.</p>
+    <p>Computing a similarity (or <em>affinity</em>) matrix <code 
class="highlighter-rouge">$$\mathbf{A}$$</code> from the data. This involves 
determining a pairwise distance function <code 
class="highlighter-rouge">$$f$$</code> that takes a pair of data points and 
returns a scalar.</p>
   </li>
   <li>
-    <p>Computing a graph Laplacian <code 
class="highlighter-rouge">\(\mathbf{L}\)</code> from the affinity matrix. There 
are several types of graph Laplacians; which is used will often depends on the 
situation.</p>
+    <p>Computing a graph Laplacian <code 
class="highlighter-rouge">$$\mathbf{L}$$</code> from the affinity matrix. There 
are several types of graph Laplacians; which is used will often depends on the 
situation.</p>
   </li>
   <li>
-    <p>Computing the eigenvectors and eigenvalues of <code 
class="highlighter-rouge">\(\mathbf{L}\)</code>. The degree of this 
decomposition is often modulated by <code 
class="highlighter-rouge">\(k\)</code>, or the number of clusters. Put another 
way, <code class="highlighter-rouge">\(k\)</code> eigenvectors and eigenvalues 
are computed.</p>
+    <p>Computing the eigenvectors and eigenvalues of <code 
class="highlighter-rouge">$$\mathbf{L}$$</code>. The degree of this 
decomposition is often modulated by <code 
class="highlighter-rouge">$$k$$</code>, or the number of clusters. Put another 
way, <code class="highlighter-rouge">$$k$$</code> eigenvectors and eigenvalues 
are computed.</p>
   </li>
   <li>
-    <p>The <code class="highlighter-rouge">\(k\)</code> eigenvectors are used 
as “proxy” data for the original dataset, and fed into k-means clustering. 
The resulting cluster assignments are transparently passed back to the original 
data.</p>
+    <p>The <code class="highlighter-rouge">$$k$$</code> eigenvectors are used 
as “proxy” data for the original dataset, and fed into k-means clustering. 
The resulting cluster assignments are transparently passed back to the original 
data.</p>
   </li>
 </ol>
 
@@ -196,11 +196,11 @@
 
 <h2 id="input">Input</h2>
 
-<p>The input format for the algorithm currently takes the form of a 
Hadoop-backed affinity matrix in the form of text files. Each line of the text 
file specifies a single element of the affinity matrix: the row index <code 
class="highlighter-rouge">\(i\)</code>, the column index <code 
class="highlighter-rouge">\(j\)</code>, and the value:</p>
+<p>The input format for the algorithm currently takes the form of a 
Hadoop-backed affinity matrix in the form of text files. Each line of the text 
file specifies a single element of the affinity matrix: the row index <code 
class="highlighter-rouge">$$i$$</code>, the column index <code 
class="highlighter-rouge">$$j$$</code>, and the value:</p>
 
 <p><code class="highlighter-rouge">i, j, value</code></p>
 
-<p>The affinity matrix is symmetric, and any unspecified <code 
class="highlighter-rouge">\(i, j\)</code> pairs are assumed to be 0 for 
sparsity. The row and column indices are 0-indexed. Thus, only the non-zero 
entries of either the upper or lower triangular need be specified.</p>
+<p>The affinity matrix is symmetric, and any unspecified <code 
class="highlighter-rouge">$$i, j$$</code> pairs are assumed to be 0 for 
sparsity. The row and column indices are 0-indexed. Thus, only the non-zero 
entries of either the upper or lower triangular need be specified.</p>
 
 <p>The matrix elements specified in the text files are collected into a Mahout 
<code class="highlighter-rouge">DistributedRowMatrix</code>.</p>
 

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/reccomenders/d-als.html
----------------------------------------------------------------------
diff --git a/docs/latest/algorithms/reccomenders/d-als.html 
b/docs/latest/algorithms/reccomenders/d-als.html
index 96abb53..01ca25f 100644
--- a/docs/latest/algorithms/reccomenders/d-als.html
+++ b/docs/latest/algorithms/reccomenders/d-als.html
@@ -174,7 +174,7 @@ TODO: Find the ALS Page</p>
 
 <h2 id="algorithm">Algorithm</h2>
 
-<p>For the classic QR decomposition of the form <code 
class="highlighter-rouge">\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times
 n}\)</code> a distributed version is fairly easily achieved if <code 
class="highlighter-rouge">\(\mathbf{A}\)</code> is tall and thin such that 
<code class="highlighter-rouge">\(\mathbf{A}^{\top}\mathbf{A}\)</code> fits in 
memory, i.e. <em>m</em> is large but <em>n</em> &lt; ~5000 Under such 
circumstances, only <code class="highlighter-rouge">\(\mathbf{A}\)</code> and 
<code class="highlighter-rouge">\(\mathbf{Q}\)</code> are distributed matricies 
and <code class="highlighter-rouge">\(\mathbf{A^{\top}A}\)</code> and <code 
class="highlighter-rouge">\(\mathbf{R}\)</code> are in-core products. We just 
compute the in-core version of the Cholesky decomposition in the form of <code 
class="highlighter-rouge">\(\mathbf{LL}^{\top}= 
\mathbf{A}^{\top}\mathbf{A}\)</code>.  After that we take <code 
class="highlighter-rouge">\(\mathbf{R}= \mathbf{L}^{\top}\)</co
 de> and <code 
class="highlighter-rouge">\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)</code>.
  The latter is easily achieved by multiplying each verticle block of <code 
class="highlighter-rouge">\(\mathbf{A}\)</code> by <code 
class="highlighter-rouge">\(\left(\mathbf{L}^{\top}\right)^{-1}\)</code>.  
(There is no actual matrix inversion happening).</p>
+<p>For the classic QR decomposition of the form <code 
class="highlighter-rouge">$$\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times
 n}$$</code> a distributed version is fairly easily achieved if <code 
class="highlighter-rouge">$$\mathbf{A}$$</code> is tall and thin such that 
<code class="highlighter-rouge">$$\mathbf{A}^{\top}\mathbf{A}$$</code> fits in 
memory, i.e. <em>m</em> is large but <em>n</em> &lt; ~5000 Under such 
circumstances, only <code class="highlighter-rouge">$$\mathbf{A}$$</code> and 
<code class="highlighter-rouge">$$\mathbf{Q}$$</code> are distributed matricies 
and <code class="highlighter-rouge">$$\mathbf{A^{\top}A}$$</code> and <code 
class="highlighter-rouge">$$\mathbf{R}$$</code> are in-core products. We just 
compute the in-core version of the Cholesky decomposition in the form of <code 
class="highlighter-rouge">$$\mathbf{LL}^{\top}= 
\mathbf{A}^{\top}\mathbf{A}$$</code>.  After that we take <code 
class="highlighter-rouge">$$\mathbf{R}= \mathbf{L}^{\top}$$</co
 de> and <code 
class="highlighter-rouge">$$\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}$$</code>.
  The latter is easily achieved by multiplying each verticle block of <code 
class="highlighter-rouge">$$\mathbf{A}$$</code> by <code 
class="highlighter-rouge">$$\left(\mathbf{L}^{\top}\right)^{-1}$$</code>.  
(There is no actual matrix inversion happening).</p>
 
 <h2 id="implementation">Implementation</h2>
 

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/regression/ols.html
----------------------------------------------------------------------
diff --git a/docs/latest/algorithms/regression/ols.html 
b/docs/latest/algorithms/regression/ols.html
index 241062d..41414ea 100644
--- a/docs/latest/algorithms/regression/ols.html
+++ b/docs/latest/algorithms/regression/ols.html
@@ -191,12 +191,12 @@ This is in stark contrast to many “big data machine 
learning” frameworks whi
      </tr>
      <tr>
         <td><code>'calcStandardErrors</code></td>
-        <td>Calculate the standard errors (and subsequent "t-scores" and 
"p-values") of the \(\boldsymbol{\beta}$$ estimates</td>
+        <td>Calculate the standard errors (and subsequent "t-scores" and 
"p-values") of the $$\boldsymbol{\beta}$$ estimates</td>
         <td><code>true</code></td>
      </tr>
      <tr>
         <td><code>'addIntercept</code></td>
-        <td>Add an intercept to \(\mathbf{X}$$</td>
+        <td>Add an intercept to $$\mathbf{X}$$</td>
         <td><code>true</code></td>
      </tr>                 
   </table>

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/index.html
----------------------------------------------------------------------
diff --git a/docs/latest/index.html b/docs/latest/index.html
index fabade9..d78820f 100644
--- a/docs/latest/index.html
+++ b/docs/latest/index.html
@@ -219,10 +219,10 @@ which are wrappers around RDDs (in Spark).</p>
 </div>
 
 <p>Which is</p>
-<center>\(\mathbf{A^\intercal A}\)</center>
+<center>$$\mathbf{A^\intercal A}$$</center>
 
 <p>Transposing a large matrix is a very expensive thing to do, and in this 
case we don’t actually need to do it. There is a
-more efficient way to calculate <foo>\(\mathbf{A^\intercal A}\)</foo> that 
doesn’t require a physical transpose.</p>
+more efficient way to calculate <foo>$$\mathbf{A^\intercal A}$$</foo> that 
doesn’t require a physical transpose.</p>
 
 <p>(Image showing this)</p>
 

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/mahout-samsara/in-core-reference.html
----------------------------------------------------------------------
diff --git a/docs/latest/mahout-samsara/in-core-reference.html 
b/docs/latest/mahout-samsara/in-core-reference.html
index 2fc8671..509ecef 100644
--- a/docs/latest/mahout-samsara/in-core-reference.html
+++ b/docs/latest/mahout-samsara/in-core-reference.html
@@ -414,7 +414,7 @@ a !== b
 </code></pre>
 </div>
 
-<p><em>note: Transposition is currently handled via view, i.e. updating a 
transposed matrix will be updating the original.</em>  Also computing something 
like <code class="highlighter-rouge">\(\mathbf{X^\top}\mathbf{X}\)</code>:</p>
+<p><em>note: Transposition is currently handled via view, i.e. updating a 
transposed matrix will be updating the original.</em>  Also computing something 
like <code class="highlighter-rouge">$$\mathbf{X^\top}\mathbf{X}$$</code>:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>val XtX = X.t %*% X
 </code></pre>
@@ -470,19 +470,19 @@ a !== b
 
 <p><strong>Solving linear equation systems and matrix inversion:</strong> 
fully similar to R semantics; there are three forms of invocation:</p>
 
-<p>Solve <code class="highlighter-rouge">\(\mathbf{AX}=\mathbf{B}\)</code>:</p>
+<p>Solve <code class="highlighter-rouge">$$\mathbf{AX}=\mathbf{B}$$</code>:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>solve(A, B)
 </code></pre>
 </div>
 
-<p>Solve <code class="highlighter-rouge">\(\mathbf{Ax}=\mathbf{b}\)</code>:</p>
+<p>Solve <code class="highlighter-rouge">$$\mathbf{Ax}=\mathbf{b}$$</code>:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>solve(A, b)
 </code></pre>
 </div>
 
-<p>Compute <code class="highlighter-rouge">\(\mathbf{A^{-1}}\)</code>:</p>
+<p>Compute <code class="highlighter-rouge">$$\mathbf{A^{-1}}$$</code>:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>solve(A)
 </code></pre>
@@ -520,19 +520,19 @@ m.rowMeans
 
 <h4 id="random-matrices">Random Matrices</h4>
 
-<p><code class="highlighter-rouge">\(\mathcal{U}\)</code>(0,1) random matrix 
view:</p>
+<p><code class="highlighter-rouge">$$\mathcal{U}$$</code>(0,1) random matrix 
view:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>val incCoreA = 
Matrices.uniformView(m, n, seed)
 </code></pre>
 </div>
 
-<p><code class="highlighter-rouge">\(\mathcal{U}\)</code>(-1,1) random matrix 
view:</p>
+<p><code class="highlighter-rouge">$$\mathcal{U}$$</code>(-1,1) random matrix 
view:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>val incCoreA = 
Matrices.symmetricUniformView(m, n, seed)
 </code></pre>
 </div>
 
-<p><code class="highlighter-rouge">\(\mathcal{N}\)</code>(-1,1) random matrix 
view:</p>
+<p><code class="highlighter-rouge">$$\mathcal{N}$$</code>(-1,1) random matrix 
view:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>val incCoreA = 
Matrices.gaussianView(m, n, seed)
 </code></pre>

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/mahout-samsara/out-of-core-reference.html
----------------------------------------------------------------------
diff --git a/docs/latest/mahout-samsara/out-of-core-reference.html 
b/docs/latest/mahout-samsara/out-of-core-reference.html
index d1b3908..1b33383 100644
--- a/docs/latest/mahout-samsara/out-of-core-reference.html
+++ b/docs/latest/mahout-samsara/out-of-core-reference.html
@@ -324,7 +324,7 @@ inCoreA /: B
 
 <p><strong>Matrix-matrix multiplication %*%</strong>:</p>
 
-<p><code class="highlighter-rouge">\(\mathbf{M}=\mathbf{AB}\)</code></p>
+<p><code class="highlighter-rouge">$$\mathbf{M}=\mathbf{AB}$$</code></p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>A %*% B
 A %*% inCoreB
@@ -336,7 +336,7 @@ A %*%: B
 <p><em>Note: same as above, whenever operator arguments include both in-core 
and out-of-core arguments, the operator can only be associated with the 
out-of-core (DRM) argument to support the distributed implementation.</em></p>
 
 <p><strong>Matrix-vector multiplication %*%</strong>
-Currently we support a right multiply product of a DRM and an in-core 
Vector(<code class="highlighter-rouge">\(\mathbf{Ax}\)</code>) resulting in a 
single column DRM, which then can be collected in front (usually the desired 
outcome):</p>
+Currently we support a right multiply product of a DRM and an in-core 
Vector(<code class="highlighter-rouge">$$\mathbf{Ax}$$</code>) resulting in a 
single column DRM, which then can be collected in front (usually the desired 
outcome):</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>val Ax = A %*% x
 val inCoreX = Ax.collect(::, 0)
@@ -356,7 +356,7 @@ A / 5.0
 </code></pre>
 </div>
 
-<p>Note that <code class="highlighter-rouge">5.0 -: A</code> means <code 
class="highlighter-rouge">\(m_{ij} = 5 - a_{ij}\)</code> and <code 
class="highlighter-rouge">5.0 /: A</code> means <code 
class="highlighter-rouge">\(m_{ij} = \frac{5}{a{ij}}\)</code> for all elements 
of the result.</p>
+<p>Note that <code class="highlighter-rouge">5.0 -: A</code> means <code 
class="highlighter-rouge">$$m_{ij} = 5 - a_{ij}$$</code> and <code 
class="highlighter-rouge">5.0 /: A</code> means <code 
class="highlighter-rouge">$$m_{ij} = \frac{5}{a{ij}}$$</code> for all elements 
of the result.</p>
 
 <h4 id="slicing">Slicing</h4>
 

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/tutorials/samsara/play-with-shell.html
----------------------------------------------------------------------
diff --git a/docs/latest/tutorials/samsara/play-with-shell.html 
b/docs/latest/tutorials/samsara/play-with-shell.html
index 2574abf..fbe89ff 100644
--- a/docs/latest/tutorials/samsara/play-with-shell.html
+++ b/docs/latest/tutorials/samsara/play-with-shell.html
@@ -314,15 +314,15 @@ val drmData = drmParallelize(dense(
 <p>Have a look at this matrix. The first four columns represent the 
ingredients 
 (our features) and the last column (the rating) is the target variable for 
 our regression. <a 
href="https://en.wikipedia.org/wiki/Linear_regression";>Linear regression</a> 
-assumes that the <strong>target variable</strong> <code 
class="highlighter-rouge">\(\mathbf{y}\)</code> is generated by the 
-linear combination of <strong>the feature matrix</strong> <code 
class="highlighter-rouge">\(\mathbf{X}\)</code> with the 
-<strong>parameter vector</strong> <code 
class="highlighter-rouge">\(\boldsymbol{\beta}\)</code> plus the
- <strong>noise</strong> <code 
class="highlighter-rouge">\(\boldsymbol{\varepsilon}\)</code>, summarized in 
the formula 
-<code 
class="highlighter-rouge">\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)</code>.
 
+assumes that the <strong>target variable</strong> <code 
class="highlighter-rouge">$$\mathbf{y}$$</code> is generated by the
+linear combination of <strong>the feature matrix</strong> <code 
class="highlighter-rouge">$$\mathbf{X}$$</code> with the
+<strong>parameter vector</strong> <code 
class="highlighter-rouge">$$\boldsymbol{\beta}$$</code> plus the
+ <strong>noise</strong> <code 
class="highlighter-rouge">$$\boldsymbol{\varepsilon}$$</code>, summarized in 
the formula
+<code 
class="highlighter-rouge">$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}$$</code>.
 Our goal is to find an estimate of the parameter vector 
-<code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code> that explains 
the data very well.</p>
+<code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code> that explains 
the data very well.</p>
 
-<p>As a first step, we extract <code 
class="highlighter-rouge">\(\mathbf{X}\)</code> and <code 
class="highlighter-rouge">\(\mathbf{y}\)</code> from our data matrix. We get 
<em>X</em> by slicing: we take all rows (denoted by <code 
class="highlighter-rouge">::</code>) and the first four columns, which have the 
ingredients in milligrams as content. Note that the result is again a DRM. The 
shell will not execute this code yet, it saves the history of operations and 
defers the execution until we really access a result. <strong>Mahout’s DSL 
automatically optimizes and parallelizes all operations on DRMs and runs them 
on Apache Spark.</strong></p>
+<p>As a first step, we extract <code 
class="highlighter-rouge">$$\mathbf{X}$$</code> and <code 
class="highlighter-rouge">$$\mathbf{y}$$</code> from our data matrix. We get 
<em>X</em> by slicing: we take all rows (denoted by <code 
class="highlighter-rouge">::</code>) and the first four columns, which have the 
ingredients in milligrams as content. Note that the result is again a DRM. The 
shell will not execute this code yet, it saves the history of operations and 
defers the execution until we really access a result. <strong>Mahout’s DSL 
automatically optimizes and parallelizes all operations on DRMs and runs them 
on Apache Spark.</strong></p>
 
 <div class="codehilite"><pre>
 val drmX = drmData(::, 0 until 4)
@@ -334,27 +334,27 @@ val drmX = drmData(::, 0 until 4)
 val y = drmData.collect(::, 4)
 </pre></div>
 
-<p>Now we are ready to think about a mathematical way to estimate the 
parameter vector <em>β</em>. A simple textbook approach is <a 
href="https://en.wikipedia.org/wiki/Ordinary_least_squares";>ordinary least 
squares (OLS)</a>, which minimizes the sum of residual squares between the true 
target variable and the prediction of the target variable. In OLS, there is 
even a closed form expression for estimating <code 
class="highlighter-rouge">\(\boldsymbol{\beta}\)</code> as 
-<code 
class="highlighter-rouge">\(\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{X}^{\top}\mathbf{y}\)</code>.</p>
+<p>Now we are ready to think about a mathematical way to estimate the 
parameter vector <em>β</em>. A simple textbook approach is <a 
href="https://en.wikipedia.org/wiki/Ordinary_least_squares";>ordinary least 
squares (OLS)</a>, which minimizes the sum of residual squares between the true 
target variable and the prediction of the target variable. In OLS, there is 
even a closed form expression for estimating <code 
class="highlighter-rouge">$$\boldsymbol{\beta}$$</code> as
+<code 
class="highlighter-rouge">$$\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{X}^{\top}\mathbf{y}$$</code>.</p>
 
-<p>The first thing which we compute for this is  <code 
class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{X}\)</code>. The code for 
doing this in Mahout’s scala DSL maps directly to the mathematical formula. 
The operation <code class="highlighter-rouge">.t()</code> transposes a matrix 
and analogous to R <code class="highlighter-rouge">%*%</code> denotes matrix 
multiplication.</p>
+<p>The first thing which we compute for this is  <code 
class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{X}$$</code>. The code for 
doing this in Mahout’s scala DSL maps directly to the mathematical formula. 
The operation <code class="highlighter-rouge">.t()</code> transposes a matrix 
and analogous to R <code class="highlighter-rouge">%*%</code> denotes matrix 
multiplication.</p>
 
 <div class="codehilite"><pre>
 val drmXtX = drmX.t %*% drmX
 </pre></div>
 
-<p>The same is true for computing <code 
class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{y}\)</code>. We can simply 
type the math in scala expressions into the shell. Here, <em>X</em> lives in 
the cluster, while is <em>y</em> in the memory of the driver, and the result is 
a DRM again.</p>
+<p>The same is true for computing <code 
class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{y}$$</code>. We can simply 
type the math in scala expressions into the shell. Here, <em>X</em> lives in 
the cluster, while is <em>y</em> in the memory of the driver, and the result is 
a DRM again.</p>
 <div class="codehilite"><pre>
 val drmXty = drmX.t %*% y
 </pre></div>
 
-<p>We’re nearly done. The next step we take is to fetch <code 
class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{X}\)</code> and 
-<code class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{y}\)</code> into 
the memory of our driver machine (we are targeting 
+<p>We’re nearly done. The next step we take is to fetch <code 
class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{X}$$</code> and
+<code class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{y}$$</code> into 
the memory of our driver machine (we are targeting
 features matrices that are tall and skinny , 
-so we can assume that <code 
class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{X}\)</code> is small 
enough 
+so we can assume that <code 
class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{X}$$</code> is small enough
 to fit in). Then, we provide them to an in-memory solver (Mahout provides 
 the an analog to R’s <code class="highlighter-rouge">solve()</code> for 
that) which computes <code class="highlighter-rouge">beta</code>, our 
-OLS estimate of the parameter vector <code 
class="highlighter-rouge">\(\boldsymbol{\beta}\)</code>.</p>
+OLS estimate of the parameter vector <code 
class="highlighter-rouge">$$\boldsymbol{\beta}$$</code>.</p>
 
 <div class="codehilite"><pre>
 val XtX = drmXtX.collect
@@ -371,9 +371,9 @@ as much as possible, while still retaining decent 
performance and
 scalability.</p>
 
 <p>We can now check how well our model fits its training data. 
-First, we multiply the feature matrix <code 
class="highlighter-rouge">\(\mathbf{X}\)</code> by our estimate of 
-<code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code>. Then, we look 
at the difference (via L2-norm) of 
-the target variable <code class="highlighter-rouge">\(\mathbf{y}\)</code> to 
the fitted target variable:</p>
+First, we multiply the feature matrix <code 
class="highlighter-rouge">$$\mathbf{X}$$</code> by our estimate of
+<code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code>. Then, we look 
at the difference (via L2-norm) of
+the target variable <code class="highlighter-rouge">$$\mathbf{y}$$</code> to 
the fitted target variable:</p>
 
 <div class="codehilite"><pre>
 val yFitted = (drmX %*% beta).collect(::, 0)
@@ -406,7 +406,7 @@ def goodnessOfFit(drmX: DrmLike[Int], beta: Vector, y: 
Vector) = {
 model. Usually there is a constant bias term added to the model. Without 
 that, our model always crosses through the origin and we only learn the 
 right angle. An easy way to add such a bias term to our model is to add a 
-column of ones to the feature matrix <code 
class="highlighter-rouge">\(\mathbf{X}\)</code>. 
+column of ones to the feature matrix <code 
class="highlighter-rouge">$$\mathbf{X}$$</code>.
 The corresponding weight in the parameter vector will then be the bias 
term.</p>
 
 <p>Here is how we add a bias column:</p>

http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/tutorials/samsara/spark-naive-bayes.html
----------------------------------------------------------------------
diff --git a/docs/latest/tutorials/samsara/spark-naive-bayes.html 
b/docs/latest/tutorials/samsara/spark-naive-bayes.html
index dfa8a6d..b0b4819 100644
--- a/docs/latest/tutorials/samsara/spark-naive-bayes.html
+++ b/docs/latest/tutorials/samsara/spark-naive-bayes.html
@@ -181,38 +181,38 @@
 <p>As described in <a 
href="http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf";>[1]</a> Mahout 
Naive Bayes is broken down into the following steps (assignments are over all 
possible index values):</p>
 
 <ul>
-  <li>Let <code 
class="highlighter-rouge">\(\vec{d}=(\vec{d_1},...,\vec{d_n})\)</code> be a set 
of documents; <code class="highlighter-rouge">\(d_{ij}\)</code> is the count of 
word <code class="highlighter-rouge">\(i\)</code> in document <code 
class="highlighter-rouge">\(j\)</code>.</li>
-  <li>Let <code class="highlighter-rouge">\(\vec{y}=(y_1,...,y_n)\)</code> be 
their labels.</li>
-  <li>Let <code class="highlighter-rouge">\(\alpha_i\)</code> be a smoothing 
parameter for all words in the vocabulary; let <code 
class="highlighter-rouge">\(\alpha=\sum_i{\alpha_i}\)</code>.</li>
-  <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and 
L2 length normalization of <code class="highlighter-rouge">\(\vec{d}\)</code>
+  <li>Let <code 
class="highlighter-rouge">$$\vec{d}=(\vec{d_1},...,\vec{d_n})$$</code> be a set 
of documents; <code class="highlighter-rouge">$$d_{ij}$$</code> is the count of 
word <code class="highlighter-rouge">$$i$$</code> in document <code 
class="highlighter-rouge">$$j$$</code>.</li>
+  <li>Let <code class="highlighter-rouge">$$\vec{y}=(y_1,...,y_n)$$</code> be 
their labels.</li>
+  <li>Let <code class="highlighter-rouge">$$\alpha_i$$</code> be a smoothing 
parameter for all words in the vocabulary; let <code 
class="highlighter-rouge">$$\alpha=\sum_i{\alpha_i}$$</code>.</li>
+  <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and 
L2 length normalization of <code class="highlighter-rouge">$$\vec{d}$$</code>
     <ol>
-      <li><code class="highlighter-rouge">\(d_{ij} = 
\sqrt{d_{ij}}\)</code></li>
-      <li><code class="highlighter-rouge">\(d_{ij} = 
d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)\)</code></li>
-      <li><code class="highlighter-rouge">\(d_{ij} 
=\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}\)</code></li>
+      <li><code class="highlighter-rouge">$$d_{ij} = 
\sqrt{d_{ij}}$$</code></li>
+      <li><code class="highlighter-rouge">$$d_{ij} = 
d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)$$</code></li>
+      <li><code class="highlighter-rouge">$$d_{ij} 
=\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}$$</code></li>
     </ol>
   </li>
-  <li><strong>Training: Bayes</strong><code 
class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights 
<code class="highlighter-rouge">\(w_{ci}\)</code> as:
+  <li><strong>Training: Bayes</strong><code 
class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights 
<code class="highlighter-rouge">$$w_{ci}$$</code> as:
     <ol>
-      <li><code 
class="highlighter-rouge">\(\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}\)</code></li>
-      <li><code 
class="highlighter-rouge">\(w_{ci}=\log{\hat\theta_{ci}}\)</code></li>
+      <li><code 
class="highlighter-rouge">$$\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}$$</code></li>
+      <li><code 
class="highlighter-rouge">$$w_{ci}=\log{\hat\theta_{ci}}$$</code></li>
     </ol>
   </li>
-  <li><strong>Training: CBayes</strong><code 
class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights 
<code class="highlighter-rouge">\(w_{ci}\)</code> as:
+  <li><strong>Training: CBayes</strong><code 
class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights 
<code class="highlighter-rouge">$$w_{ci}$$</code> as:
     <ol>
-      <li><code class="highlighter-rouge">\(\hat\theta_{ci} = 
\frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq 
c}{\sum_k{d_{kj}}}+\alpha}\)</code></li>
-      <li><code 
class="highlighter-rouge">\(w_{ci}=-\log{\hat\theta_{ci}}\)</code></li>
-      <li><code class="highlighter-rouge">\(w_{ci}=\frac{w_{ci}}{\sum_i \lvert 
w_{ci}\rvert}\)</code></li>
+      <li><code class="highlighter-rouge">$$\hat\theta_{ci} = 
\frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq 
c}{\sum_k{d_{kj}}}+\alpha}$$</code></li>
+      <li><code 
class="highlighter-rouge">$$w_{ci}=-\log{\hat\theta_{ci}}$$</code></li>
+      <li><code class="highlighter-rouge">$$w_{ci}=\frac{w_{ci}}{\sum_i \lvert 
w_{ci}\rvert}$$</code></li>
     </ol>
   </li>
   <li><strong>Label Assignment/Testing:</strong>
     <ol>
-      <li>Let <code class="highlighter-rouge">\(\vec{t}= 
(t_1,...,t_n)\)</code> be a test document; let <code 
class="highlighter-rouge">\(t_i\)</code> be the count of the word <code 
class="highlighter-rouge">\(t\)</code>.</li>
-      <li>Label the document according to <code 
class="highlighter-rouge">\(l(t)=\arg\max_c \sum\limits_{i} t_i 
w_{ci}\)</code></li>
+      <li>Let <code class="highlighter-rouge">$$\vec{t}= 
(t_1,...,t_n)$$</code> be a test document; let <code 
class="highlighter-rouge">$$t_i$$</code> be the count of the word <code 
class="highlighter-rouge">$$t$$</code>.</li>
+      <li>Label the document according to <code 
class="highlighter-rouge">$$l(t)=\arg\max_c \sum\limits_{i} t_i 
w_{ci}$$</code></li>
     </ol>
   </li>
 </ul>
 
-<p>As we can see, the main difference between Bayes and CBayes is the weight 
calculation step.  Where Bayes weighs terms more heavily based on the 
likelihood that they belong to class <code 
class="highlighter-rouge">\(c\)</code>, CBayes seeks to maximize term weights 
on the likelihood that they do not belong to any other class.</p>
+<p>As we can see, the main difference between Bayes and CBayes is the weight 
calculation step.  Where Bayes weighs terms more heavily based on the 
likelihood that they belong to class <code 
class="highlighter-rouge">$$c$$</code>, CBayes seeks to maximize term weights 
on the likelihood that they do not belong to any other class.</p>
 
 <h2 id="running-from-the-command-line">Running from the command line</h2>
 

Reply via email to