Repository: mahout Updated Branches: refs/heads/asf-site be7bef00c -> bdaf56d26
replace \( and \) with 22996 in /latest dir (double dollarsigns) Project: http://git-wip-us.apache.org/repos/asf/mahout/repo Commit: http://git-wip-us.apache.org/repos/asf/mahout/commit/bdaf56d2 Tree: http://git-wip-us.apache.org/repos/asf/mahout/tree/bdaf56d2 Diff: http://git-wip-us.apache.org/repos/asf/mahout/diff/bdaf56d2 Branch: refs/heads/asf-site Commit: bdaf56d2692396ab13791e92ed617b7eb5c5fcd3 Parents: be7bef0 Author: Andrew Palumbo <[email protected]> Authored: Sun Dec 24 14:12:10 2017 -0800 Committer: Andrew Palumbo <[email protected]> Committed: Sun Dec 24 14:13:12 2017 -0800 ---------------------------------------------------------------------- .../algorithms/linear-algebra/d-spca.html | 42 ++++++++--------- .../algorithms/linear-algebra/d-ssvd.html | 48 ++++++++++---------- .../map-reduce/classification/bayesian.html | 34 +++++++------- .../clustering/spectral-clustering.html | 12 ++--- docs/latest/algorithms/reccomenders/d-als.html | 2 +- docs/latest/algorithms/regression/ols.html | 4 +- docs/latest/index.html | 4 +- .../mahout-samsara/in-core-reference.html | 14 +++--- .../mahout-samsara/out-of-core-reference.html | 6 +-- .../tutorials/samsara/play-with-shell.html | 38 ++++++++-------- .../tutorials/samsara/spark-naive-bayes.html | 34 +++++++------- 11 files changed, 119 insertions(+), 119 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/linear-algebra/d-spca.html ---------------------------------------------------------------------- diff --git a/docs/latest/algorithms/linear-algebra/d-spca.html b/docs/latest/algorithms/linear-algebra/d-spca.html index 9809d7d..5489626 100644 --- a/docs/latest/algorithms/linear-algebra/d-spca.html +++ b/docs/latest/algorithms/linear-algebra/d-spca.html @@ -167,36 +167,36 @@ <div class="col-lg-8"> <h2 id="intro">Intro</h2> -<p>Mahout has a distributed implementation of Stochastic PCA<a href="Lyubimov and Palumbo, ["Apache Mahout: Beyond MapReduce; Distributed Algorithm Design"](https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785)">1</a>. This algorithm computes the exact equivalent of Mahoutâs dssvd(<code class="highlighter-rouge">\(\mathbf{A-1\mu^\top}\)</code>) by modifying the <code class="highlighter-rouge">dssvd</code> algorithm so as to avoid forming <code class="highlighter-rouge">\(\mathbf{A-1\mu^\top}\)</code>, which would densify a sparse input. Thus, it is suitable for work with both dense and sparse inputs.</p> +<p>Mahout has a distributed implementation of Stochastic PCA<a href="Lyubimov and Palumbo, ["Apache Mahout: Beyond MapReduce; Distributed Algorithm Design"](https://www.amazon.com/Apache-Mahout-MapReduce-Dmitriy-Lyubimov/dp/1523775785)">1</a>. This algorithm computes the exact equivalent of Mahoutâs dssvd(<code class="highlighter-rouge">$$\mathbf{A-1\mu^\top}$$</code>) by modifying the <code class="highlighter-rouge">dssvd</code> algorithm so as to avoid forming <code class="highlighter-rouge">$$\mathbf{A-1\mu^\top}$$</code>, which would densify a sparse input. Thus, it is suitable for work with both dense and sparse inputs.</p> <h2 id="algorithm">Algorithm</h2> -<p>Given an <em>m</em> <code class="highlighter-rouge">\(\times\)</code> <em>n</em> matrix <code class="highlighter-rouge">\(\mathbf{A}\)</code>, a target rank <em>k</em>, and an oversampling parameter <em>p</em>, this procedure computes a <em>k</em>-rank PCA by finding the unknowns in <code class="highlighter-rouge">\(\mathbf{Aâ1\mu^\top \approx U\Sigma V^\top}\)</code>:</p> +<p>Given an <em>m</em> <code class="highlighter-rouge">$$\times$$</code> <em>n</em> matrix <code class="highlighter-rouge">$$\mathbf{A}$$</code>, a target rank <em>k</em>, and an oversampling parameter <em>p</em>, this procedure computes a <em>k</em>-rank PCA by finding the unknowns in <code class="highlighter-rouge">$$\mathbf{Aâ1\mu^\top \approx U\Sigma V^\top}$$</code>:</p> <ol> - <li>Create seed for random <em>n</em> <code class="highlighter-rouge">\(\times\)</code> <em>(k+p)</em> matrix <code class="highlighter-rouge">\(\Omega\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{s_\Omega \leftarrow \Omega^\top \mu}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{Y_0 \leftarrow A\Omega â 1 {s_\Omega}^\top, Y \in \mathbb{R}^{m\times(k+p)}}\)</code>.</li> - <li>Column-orthonormalize <code class="highlighter-rouge">\(\mathbf{Y_0} \rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code class="highlighter-rouge">\(\mathbf{Y_0} = \mathbf{QR}\)</code>. Also, <code class="highlighter-rouge">\(\mathbf{Q}\in\mathbb{R}^{m\times(k+p)}, \mathbf{R}\in\mathbb{R}^{(k+p)\times(k+p)}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{s_Q \leftarrow Q^\top 1}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{B_0 \leftarrow Q^\top A: B \in \mathbb{R}^{(k+p)\times n}}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{s_B \leftarrow {B_0}^\top \mu}\)</code>.</li> + <li>Create seed for random <em>n</em> <code class="highlighter-rouge">$$\times$$</code> <em>(k+p)</em> matrix <code class="highlighter-rouge">$$\Omega$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{s_\Omega \leftarrow \Omega^\top \mu}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{Y_0 \leftarrow A\Omega â 1 {s_\Omega}^\top, Y \in \mathbb{R}^{m\times(k+p)}}$$</code>.</li> + <li>Column-orthonormalize <code class="highlighter-rouge">$$\mathbf{Y_0} \rightarrow \mathbf{Q}$$</code> by computing thin decomposition <code class="highlighter-rouge">$$\mathbf{Y_0} = \mathbf{QR}$$</code>. Also, <code class="highlighter-rouge">$$\mathbf{Q}\in\mathbb{R}^{m\times(k+p)}, \mathbf{R}\in\mathbb{R}^{(k+p)\times(k+p)}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{s_Q \leftarrow Q^\top 1}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{B_0 \leftarrow Q^\top A: B \in \mathbb{R}^{(k+p)\times n}}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{s_B \leftarrow {B_0}^\top \mu}$$</code>.</li> <li>For <em>i</em> in 1..<em>q</em> repeat (power iterations): <ul> - <li>For <em>j</em> in 1..<em>n</em> apply <code class="highlighter-rouge">\(\mathbf{(B_{iâ1})_{âj} \leftarrow (B_{iâ1})_{âj}â\mu_j s_Q}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{Y_i \leftarrow A{B_{iâ1}}^\topâ1(s_Bâ\mu^\top \mu s_Q)^\top}\)</code>.</li> - <li>Column-orthonormalize <code class="highlighter-rouge">\(\mathbf{Y_i} \rightarrow \mathbf{Q}\)</code> by computing thin decomposition <code class="highlighter-rouge">\(\mathbf{Y_i = QR}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{s_Q \leftarrow Q^\top 1}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{B_i \leftarrow Q^\top A}\)</code>.</li> - <li><code class="highlighter-rouge">\(\mathbf{s_B \leftarrow {B_i}^\top \mu}\)</code>.</li> + <li>For <em>j</em> in 1..<em>n</em> apply <code class="highlighter-rouge">$$\mathbf{(B_{iâ1})_{âj} \leftarrow (B_{iâ1})_{âj}â\mu_j s_Q}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{Y_i \leftarrow A{B_{iâ1}}^\topâ1(s_Bâ\mu^\top \mu s_Q)^\top}$$</code>.</li> + <li>Column-orthonormalize <code class="highlighter-rouge">$$\mathbf{Y_i} \rightarrow \mathbf{Q}$$</code> by computing thin decomposition <code class="highlighter-rouge">$$\mathbf{Y_i = QR}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{s_Q \leftarrow Q^\top 1}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{B_i \leftarrow Q^\top A}$$</code>.</li> + <li><code class="highlighter-rouge">$$\mathbf{s_B \leftarrow {B_i}^\top \mu}$$</code>.</li> </ul> </li> - <li>Let <code class="highlighter-rouge">\(\mathbf{C \triangleq s_Q {s_B}^\top}\)</code>. <code class="highlighter-rouge">\(\mathbf{M \leftarrow B_q {B_q}^\top â C â C^\top + \mu^\top \mu s_Q {s_Q}^\top}\)</code>.</li> - <li>Compute an eigensolution of the small symmetric <code class="highlighter-rouge">\(\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in \mathbb{R}^{(k+p)\times(k+p)}}\)</code>.</li> - <li>The singular values <code class="highlighter-rouge">\(\Sigma = \Lambda^{\circ 0.5}\)</code>, or, in other words, <code class="highlighter-rouge">\(\mathbf{\sigma_i= \sqrt{\lambda_i}}\)</code>.</li> - <li>If needed, compute <code class="highlighter-rouge">\(\mathbf{U = Q\hat{U}}\)</code>.</li> - <li>If needed, compute <code class="highlighter-rouge">\(\mathbf{V = B^\top \hat{U} \Sigma^{â1}}\)</code>.</li> - <li>If needed, items converted to the PCA space can be computed as <code class="highlighter-rouge">\(\mathbf{U\Sigma}\)</code>.</li> + <li>Let <code class="highlighter-rouge">$$\mathbf{C \triangleq s_Q {s_B}^\top}$$</code>. <code class="highlighter-rouge">$$\mathbf{M \leftarrow B_q {B_q}^\top â C â C^\top + \mu^\top \mu s_Q {s_Q}^\top}$$</code>.</li> + <li>Compute an eigensolution of the small symmetric <code class="highlighter-rouge">$$\mathbf{M = \hat{U} \Lambda \hat{U}^\top: M \in \mathbb{R}^{(k+p)\times(k+p)}}$$</code>.</li> + <li>The singular values <code class="highlighter-rouge">$$\Sigma = \Lambda^{\circ 0.5}$$</code>, or, in other words, <code class="highlighter-rouge">$$\mathbf{\sigma_i= \sqrt{\lambda_i}}$$</code>.</li> + <li>If needed, compute <code class="highlighter-rouge">$$\mathbf{U = Q\hat{U}}$$</code>.</li> + <li>If needed, compute <code class="highlighter-rouge">$$\mathbf{V = B^\top \hat{U} \Sigma^{â1}}$$</code>.</li> + <li>If needed, items converted to the PCA space can be computed as <code class="highlighter-rouge">$$\mathbf{U\Sigma}$$</code>.</li> </ol> <h2 id="implementation">Implementation</h2> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/linear-algebra/d-ssvd.html ---------------------------------------------------------------------- diff --git a/docs/latest/algorithms/linear-algebra/d-ssvd.html b/docs/latest/algorithms/linear-algebra/d-ssvd.html index 3f16ca0..ac339ca 100644 --- a/docs/latest/algorithms/linear-algebra/d-ssvd.html +++ b/docs/latest/algorithms/linear-algebra/d-ssvd.html @@ -171,50 +171,50 @@ <h2 id="modified-ssvd-algorithm">Modified SSVD Algorithm</h2> -<p>Given an <code class="highlighter-rouge">\(m\times n\)</code> -matrix <code class="highlighter-rouge">\(\mathbf{A}\)</code>, a target rank <code class="highlighter-rouge">\(k\in\mathbb{N}_{1}\)</code> -, an oversampling parameter <code class="highlighter-rouge">\(p\in\mathbb{N}_{1}\)</code>, -and the number of additional power iterations <code class="highlighter-rouge">\(q\in\mathbb{N}_{0}\)</code>, -this procedure computes an <code class="highlighter-rouge">\(m\times\left(k+p\right)\)</code> -SVD <code class="highlighter-rouge">\(\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}\)</code>:</p> +<p>Given an <code class="highlighter-rouge">$$m\times n$$</code> +matrix <code class="highlighter-rouge">$$\mathbf{A}$$</code>, a target rank <code class="highlighter-rouge">$$k\in\mathbb{N}_{1}$$</code> +, an oversampling parameter <code class="highlighter-rouge">$$p\in\mathbb{N}_{1}$$</code>, +and the number of additional power iterations <code class="highlighter-rouge">$$q\in\mathbb{N}_{0}$$</code>, +this procedure computes an <code class="highlighter-rouge">$$m\times\left(k+p\right)$$</code> +SVD <code class="highlighter-rouge">$$\mathbf{A\approx U}\boldsymbol{\Sigma}\mathbf{V}^{\top}$$</code>:</p> <ol> <li> - <p>Create seed for random <code class="highlighter-rouge">\(n\times\left(k+p\right)\)</code> - matrix <code class="highlighter-rouge">\(\boldsymbol{\Omega}\)</code>. The seed defines matrix <code class="highlighter-rouge">\(\mathbf{\Omega}\)</code> + <p>Create seed for random <code class="highlighter-rouge">$$n\times\left(k+p\right)$$</code> + matrix <code class="highlighter-rouge">$$\boldsymbol{\Omega}$$</code>. The seed defines matrix <code class="highlighter-rouge">$$\mathbf{\Omega}$$</code> using Gaussian unit vectors per one of suggestions in [Halko, Martinsson, Tropp].</p> </li> <li> - <p><code class="highlighter-rouge">\(\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}\)</code></p> + <p><code class="highlighter-rouge">$$\mathbf{Y=A\boldsymbol{\Omega}},\,\mathbf{Y}\in\mathbb{R}^{m\times\left(k+p\right)}$$</code></p> </li> <li> - <p>Column-orthonormalize <code class="highlighter-rouge">\(\mathbf{Y}\rightarrow\mathbf{Q}\)</code> - by computing thin decomposition <code class="highlighter-rouge">\(\mathbf{Y}=\mathbf{Q}\mathbf{R}\)</code>. - Also, <code class="highlighter-rouge">\(\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)</code>; denoted as <code class="highlighter-rouge">\(\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}\)</code></p> + <p>Column-orthonormalize <code class="highlighter-rouge">$$\mathbf{Y}\rightarrow\mathbf{Q}$$</code> + by computing thin decomposition <code class="highlighter-rouge">$$\mathbf{Y}=\mathbf{Q}\mathbf{R}$$</code>. + Also, <code class="highlighter-rouge">$$\mathbf{Q}\in\mathbb{R}^{m\times\left(k+p\right)},\,\mathbf{R}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}$$</code>; denoted as <code class="highlighter-rouge">$$\mathbf{Q}=\mbox{qr}\left(\mathbf{Y}\right).\mathbf{Q}$$</code></p> </li> <li> - <p><code class="highlighter-rouge">\(\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times n}\)</code>.</p> + <p><code class="highlighter-rouge">$$\mathbf{B}_{0}=\mathbf{Q}^{\top}\mathbf{A}:\,\,\mathbf{B}\in\mathbb{R}^{\left(k+p\right)\times n}$$</code>.</p> </li> <li> - <p>If <code class="highlighter-rouge">\(q>0\)</code> - repeat: for <code class="highlighter-rouge">\(i=1..q\)</code>: - <code class="highlighter-rouge">\(\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}\)</code> + <p>If <code class="highlighter-rouge">$$q>0$$</code> + repeat: for <code class="highlighter-rouge">$$i=1..q$$</code>: + <code class="highlighter-rouge">$$\mathbf{B}_{i}^{\top}=\mathbf{A}^{\top}\mbox{qr}\left(\mathbf{A}\mathbf{B}_{i-1}^{\top}\right).\mathbf{Q}$$</code> (power iterations step).</p> </li> <li> - <p>Compute Eigensolution of a small Hermitian <code class="highlighter-rouge">\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}\)</code>, - <code class="highlighter-rouge">\(\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}\)</code>.</p> + <p>Compute Eigensolution of a small Hermitian <code class="highlighter-rouge">$$\mathbf{B}_{q}\mathbf{B}_{q}^{\top}=\mathbf{\hat{U}}\boldsymbol{\Lambda}\mathbf{\hat{U}}^{\top}$$</code>, + <code class="highlighter-rouge">$$\mathbf{B}_{q}\mathbf{B}_{q}^{\top}\in\mathbb{R}^{\left(k+p\right)\times\left(k+p\right)}$$</code>.</p> </li> <li> - <p>Singular values <code class="highlighter-rouge">\(\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}\)</code>, - or, in other words, <code class="highlighter-rouge">\(s_{i}=\sqrt{\sigma_{i}}\)</code>.</p> + <p>Singular values <code class="highlighter-rouge">$$\mathbf{\boldsymbol{\Sigma}}=\boldsymbol{\Lambda}^{0.5}$$</code>, + or, in other words, <code class="highlighter-rouge">$$s_{i}=\sqrt{\sigma_{i}}$$</code>.</p> </li> <li> - <p>If needed, compute <code class="highlighter-rouge">\(\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}\)</code>.</p> + <p>If needed, compute <code class="highlighter-rouge">$$\mathbf{U}=\mathbf{Q}\hat{\mathbf{U}}$$</code>.</p> </li> <li> - <p>If needed, compute <code class="highlighter-rouge">\(\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}\)</code>. -Another way is <code class="highlighter-rouge">\(\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}\)</code>.</p> + <p>If needed, compute <code class="highlighter-rouge">$$\mathbf{V}=\mathbf{B}_{q}^{\top}\hat{\mathbf{U}}\boldsymbol{\Sigma}^{-1}$$</code>. +Another way is <code class="highlighter-rouge">$$\mathbf{V}=\mathbf{A}^{\top}\mathbf{U}\boldsymbol{\Sigma}^{-1}$$</code>.</p> </li> </ol> @@ -281,7 +281,7 @@ Another way is <code class="highlighter-rouge">\(\mathbf{V}=\mathbf{A}^{\top}\ma </code></pre> </div> -<p>Note: As a side effect of checkpointing, U and V values are returned as logical operators (i.e. they are neither checkpointed nor computed). Therefore there is no physical work actually done to compute <code class="highlighter-rouge">\(\mathbf{U}\)</code> or <code class="highlighter-rouge">\(\mathbf{V}\)</code> until they are used in a subsequent expression.</p> +<p>Note: As a side effect of checkpointing, U and V values are returned as logical operators (i.e. they are neither checkpointed nor computed). Therefore there is no physical work actually done to compute <code class="highlighter-rouge">$$\mathbf{U}$$</code> or <code class="highlighter-rouge">$$\mathbf{V}$$</code> until they are used in a subsequent expression.</p> <h2 id="usage">Usage</h2> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/map-reduce/classification/bayesian.html ---------------------------------------------------------------------- diff --git a/docs/latest/algorithms/map-reduce/classification/bayesian.html b/docs/latest/algorithms/map-reduce/classification/bayesian.html index 9c70058..5d11c37 100644 --- a/docs/latest/algorithms/map-reduce/classification/bayesian.html +++ b/docs/latest/algorithms/map-reduce/classification/bayesian.html @@ -181,38 +181,38 @@ <p>As described in <a href="http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf">[1]</a> Mahout Naive Bayes is broken down into the following steps (assignments are over all possible index values):</p> <ul> - <li>Let <code class="highlighter-rouge">\(\vec{d}=(\vec{d_1},...,\vec{d_n})\)</code> be a set of documents; <code class="highlighter-rouge">\(d_{ij}\)</code> is the count of word <code class="highlighter-rouge">\(i\)</code> in document <code class="highlighter-rouge">\(j\)</code>.</li> - <li>Let <code class="highlighter-rouge">\(\vec{y}=(y_1,...,y_n)\)</code> be their labels.</li> - <li>Let <code class="highlighter-rouge">\(\alpha_i\)</code> be a smoothing parameter for all words in the vocabulary; let <code class="highlighter-rouge">\(\alpha=\sum_i{\alpha_i}\)</code>.</li> - <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and L2 length normalization of <code class="highlighter-rouge">\(\vec{d}\)</code> + <li>Let <code class="highlighter-rouge">$$\vec{d}=(\vec{d_1},...,\vec{d_n})$$</code> be a set of documents; <code class="highlighter-rouge">$$d_{ij}$$</code> is the count of word <code class="highlighter-rouge">$$i$$</code> in document <code class="highlighter-rouge">$$j$$</code>.</li> + <li>Let <code class="highlighter-rouge">$$\vec{y}=(y_1,...,y_n)$$</code> be their labels.</li> + <li>Let <code class="highlighter-rouge">$$\alpha_i$$</code> be a smoothing parameter for all words in the vocabulary; let <code class="highlighter-rouge">$$\alpha=\sum_i{\alpha_i}$$</code>.</li> + <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and L2 length normalization of <code class="highlighter-rouge">$$\vec{d}$$</code> <ol> - <li><code class="highlighter-rouge">\(d_{ij} = \sqrt{d_{ij}}\)</code></li> - <li><code class="highlighter-rouge">\(d_{ij} = d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)\)</code></li> - <li><code class="highlighter-rouge">\(d_{ij} =\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}\)</code></li> + <li><code class="highlighter-rouge">$$d_{ij} = \sqrt{d_{ij}}$$</code></li> + <li><code class="highlighter-rouge">$$d_{ij} = d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)$$</code></li> + <li><code class="highlighter-rouge">$$d_{ij} =\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}$$</code></li> </ol> </li> - <li><strong>Training: Bayes</strong><code class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights <code class="highlighter-rouge">\(w_{ci}\)</code> as: + <li><strong>Training: Bayes</strong><code class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights <code class="highlighter-rouge">$$w_{ci}$$</code> as: <ol> - <li><code class="highlighter-rouge">\(\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}\)</code></li> - <li><code class="highlighter-rouge">\(w_{ci}=\log{\hat\theta_{ci}}\)</code></li> + <li><code class="highlighter-rouge">$$\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}$$</code></li> + <li><code class="highlighter-rouge">$$w_{ci}=\log{\hat\theta_{ci}}$$</code></li> </ol> </li> - <li><strong>Training: CBayes</strong><code class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights <code class="highlighter-rouge">\(w_{ci}\)</code> as: + <li><strong>Training: CBayes</strong><code class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights <code class="highlighter-rouge">$$w_{ci}$$</code> as: <ol> - <li><code class="highlighter-rouge">\(\hat\theta_{ci} = \frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq c}{\sum_k{d_{kj}}}+\alpha}\)</code></li> - <li><code class="highlighter-rouge">\(w_{ci}=-\log{\hat\theta_{ci}}\)</code></li> - <li><code class="highlighter-rouge">\(w_{ci}=\frac{w_{ci}}{\sum_i \lvert w_{ci}\rvert}\)</code></li> + <li><code class="highlighter-rouge">$$\hat\theta_{ci} = \frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq c}{\sum_k{d_{kj}}}+\alpha}$$</code></li> + <li><code class="highlighter-rouge">$$w_{ci}=-\log{\hat\theta_{ci}}$$</code></li> + <li><code class="highlighter-rouge">$$w_{ci}=\frac{w_{ci}}{\sum_i \lvert w_{ci}\rvert}$$</code></li> </ol> </li> <li><strong>Label Assignment/Testing:</strong> <ol> - <li>Let <code class="highlighter-rouge">\(\vec{t}= (t_1,...,t_n)\)</code> be a test document; let <code class="highlighter-rouge">\(t_i\)</code> be the count of the word <code class="highlighter-rouge">\(t\)</code>.</li> - <li>Label the document according to <code class="highlighter-rouge">\(l(t)=\arg\max_c \sum\limits_{i} t_i w_{ci}\)</code></li> + <li>Let <code class="highlighter-rouge">$$\vec{t}= (t_1,...,t_n)$$</code> be a test document; let <code class="highlighter-rouge">$$t_i$$</code> be the count of the word <code class="highlighter-rouge">$$t$$</code>.</li> + <li>Label the document according to <code class="highlighter-rouge">$$l(t)=\arg\max_c \sum\limits_{i} t_i w_{ci}$$</code></li> </ol> </li> </ul> -<p>As we can see, the main difference between Bayes and CBayes is the weight calculation step. Where Bayes weighs terms more heavily based on the likelihood that they belong to class <code class="highlighter-rouge">\(c\)</code>, CBayes seeks to maximize term weights on the likelihood that they do not belong to any other class.</p> +<p>As we can see, the main difference between Bayes and CBayes is the weight calculation step. Where Bayes weighs terms more heavily based on the likelihood that they belong to class <code class="highlighter-rouge">$$c$$</code>, CBayes seeks to maximize term weights on the likelihood that they do not belong to any other class.</p> <h2 id="running-from-the-command-line">Running from the command line</h2> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html ---------------------------------------------------------------------- diff --git a/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html b/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html index 0a808c2..697b563 100644 --- a/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html +++ b/docs/latest/algorithms/map-reduce/clustering/spectral-clustering.html @@ -173,16 +173,16 @@ <ol> <li> - <p>Computing a similarity (or <em>affinity</em>) matrix <code class="highlighter-rouge">\(\mathbf{A}\)</code> from the data. This involves determining a pairwise distance function <code class="highlighter-rouge">\(f\)</code> that takes a pair of data points and returns a scalar.</p> + <p>Computing a similarity (or <em>affinity</em>) matrix <code class="highlighter-rouge">$$\mathbf{A}$$</code> from the data. This involves determining a pairwise distance function <code class="highlighter-rouge">$$f$$</code> that takes a pair of data points and returns a scalar.</p> </li> <li> - <p>Computing a graph Laplacian <code class="highlighter-rouge">\(\mathbf{L}\)</code> from the affinity matrix. There are several types of graph Laplacians; which is used will often depends on the situation.</p> + <p>Computing a graph Laplacian <code class="highlighter-rouge">$$\mathbf{L}$$</code> from the affinity matrix. There are several types of graph Laplacians; which is used will often depends on the situation.</p> </li> <li> - <p>Computing the eigenvectors and eigenvalues of <code class="highlighter-rouge">\(\mathbf{L}\)</code>. The degree of this decomposition is often modulated by <code class="highlighter-rouge">\(k\)</code>, or the number of clusters. Put another way, <code class="highlighter-rouge">\(k\)</code> eigenvectors and eigenvalues are computed.</p> + <p>Computing the eigenvectors and eigenvalues of <code class="highlighter-rouge">$$\mathbf{L}$$</code>. The degree of this decomposition is often modulated by <code class="highlighter-rouge">$$k$$</code>, or the number of clusters. Put another way, <code class="highlighter-rouge">$$k$$</code> eigenvectors and eigenvalues are computed.</p> </li> <li> - <p>The <code class="highlighter-rouge">\(k\)</code> eigenvectors are used as âproxyâ data for the original dataset, and fed into k-means clustering. The resulting cluster assignments are transparently passed back to the original data.</p> + <p>The <code class="highlighter-rouge">$$k$$</code> eigenvectors are used as âproxyâ data for the original dataset, and fed into k-means clustering. The resulting cluster assignments are transparently passed back to the original data.</p> </li> </ol> @@ -196,11 +196,11 @@ <h2 id="input">Input</h2> -<p>The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix in the form of text files. Each line of the text file specifies a single element of the affinity matrix: the row index <code class="highlighter-rouge">\(i\)</code>, the column index <code class="highlighter-rouge">\(j\)</code>, and the value:</p> +<p>The input format for the algorithm currently takes the form of a Hadoop-backed affinity matrix in the form of text files. Each line of the text file specifies a single element of the affinity matrix: the row index <code class="highlighter-rouge">$$i$$</code>, the column index <code class="highlighter-rouge">$$j$$</code>, and the value:</p> <p><code class="highlighter-rouge">i, j, value</code></p> -<p>The affinity matrix is symmetric, and any unspecified <code class="highlighter-rouge">\(i, j\)</code> pairs are assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, only the non-zero entries of either the upper or lower triangular need be specified.</p> +<p>The affinity matrix is symmetric, and any unspecified <code class="highlighter-rouge">$$i, j$$</code> pairs are assumed to be 0 for sparsity. The row and column indices are 0-indexed. Thus, only the non-zero entries of either the upper or lower triangular need be specified.</p> <p>The matrix elements specified in the text files are collected into a Mahout <code class="highlighter-rouge">DistributedRowMatrix</code>.</p> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/reccomenders/d-als.html ---------------------------------------------------------------------- diff --git a/docs/latest/algorithms/reccomenders/d-als.html b/docs/latest/algorithms/reccomenders/d-als.html index 96abb53..01ca25f 100644 --- a/docs/latest/algorithms/reccomenders/d-als.html +++ b/docs/latest/algorithms/reccomenders/d-als.html @@ -174,7 +174,7 @@ TODO: Find the ALS Page</p> <h2 id="algorithm">Algorithm</h2> -<p>For the classic QR decomposition of the form <code class="highlighter-rouge">\(\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}\)</code> a distributed version is fairly easily achieved if <code class="highlighter-rouge">\(\mathbf{A}\)</code> is tall and thin such that <code class="highlighter-rouge">\(\mathbf{A}^{\top}\mathbf{A}\)</code> fits in memory, i.e. <em>m</em> is large but <em>n</em> < ~5000 Under such circumstances, only <code class="highlighter-rouge">\(\mathbf{A}\)</code> and <code class="highlighter-rouge">\(\mathbf{Q}\)</code> are distributed matricies and <code class="highlighter-rouge">\(\mathbf{A^{\top}A}\)</code> and <code class="highlighter-rouge">\(\mathbf{R}\)</code> are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of <code class="highlighter-rouge">\(\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}\)</code>. After that we take <code class="highlighter-rouge">\(\mathbf{R}= \mathbf{L}^{\top}\)</co de> and <code class="highlighter-rouge">\(\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}\)</code>. The latter is easily achieved by multiplying each verticle block of <code class="highlighter-rouge">\(\mathbf{A}\)</code> by <code class="highlighter-rouge">\(\left(\mathbf{L}^{\top}\right)^{-1}\)</code>. (There is no actual matrix inversion happening).</p> +<p>For the classic QR decomposition of the form <code class="highlighter-rouge">$$\mathbf{A}=\mathbf{QR},\mathbf{A}\in\mathbb{R}^{m\times n}$$</code> a distributed version is fairly easily achieved if <code class="highlighter-rouge">$$\mathbf{A}$$</code> is tall and thin such that <code class="highlighter-rouge">$$\mathbf{A}^{\top}\mathbf{A}$$</code> fits in memory, i.e. <em>m</em> is large but <em>n</em> < ~5000 Under such circumstances, only <code class="highlighter-rouge">$$\mathbf{A}$$</code> and <code class="highlighter-rouge">$$\mathbf{Q}$$</code> are distributed matricies and <code class="highlighter-rouge">$$\mathbf{A^{\top}A}$$</code> and <code class="highlighter-rouge">$$\mathbf{R}$$</code> are in-core products. We just compute the in-core version of the Cholesky decomposition in the form of <code class="highlighter-rouge">$$\mathbf{LL}^{\top}= \mathbf{A}^{\top}\mathbf{A}$$</code>. After that we take <code class="highlighter-rouge">$$\mathbf{R}= \mathbf{L}^{\top}$$</co de> and <code class="highlighter-rouge">$$\mathbf{Q}=\mathbf{A}\left(\mathbf{L}^{\top}\right)^{-1}$$</code>. The latter is easily achieved by multiplying each verticle block of <code class="highlighter-rouge">$$\mathbf{A}$$</code> by <code class="highlighter-rouge">$$\left(\mathbf{L}^{\top}\right)^{-1}$$</code>. (There is no actual matrix inversion happening).</p> <h2 id="implementation">Implementation</h2> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/algorithms/regression/ols.html ---------------------------------------------------------------------- diff --git a/docs/latest/algorithms/regression/ols.html b/docs/latest/algorithms/regression/ols.html index 241062d..41414ea 100644 --- a/docs/latest/algorithms/regression/ols.html +++ b/docs/latest/algorithms/regression/ols.html @@ -191,12 +191,12 @@ This is in stark contrast to many âbig data machine learningâ frameworks whi </tr> <tr> <td><code>'calcStandardErrors</code></td> - <td>Calculate the standard errors (and subsequent "t-scores" and "p-values") of the \(\boldsymbol{\beta}$$ estimates</td> + <td>Calculate the standard errors (and subsequent "t-scores" and "p-values") of the $$\boldsymbol{\beta}$$ estimates</td> <td><code>true</code></td> </tr> <tr> <td><code>'addIntercept</code></td> - <td>Add an intercept to \(\mathbf{X}$$</td> + <td>Add an intercept to $$\mathbf{X}$$</td> <td><code>true</code></td> </tr> </table> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/index.html ---------------------------------------------------------------------- diff --git a/docs/latest/index.html b/docs/latest/index.html index fabade9..d78820f 100644 --- a/docs/latest/index.html +++ b/docs/latest/index.html @@ -219,10 +219,10 @@ which are wrappers around RDDs (in Spark).</p> </div> <p>Which is</p> -<center>\(\mathbf{A^\intercal A}\)</center> +<center>$$\mathbf{A^\intercal A}$$</center> <p>Transposing a large matrix is a very expensive thing to do, and in this case we donât actually need to do it. There is a -more efficient way to calculate <foo>\(\mathbf{A^\intercal A}\)</foo> that doesnât require a physical transpose.</p> +more efficient way to calculate <foo>$$\mathbf{A^\intercal A}$$</foo> that doesnât require a physical transpose.</p> <p>(Image showing this)</p> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/mahout-samsara/in-core-reference.html ---------------------------------------------------------------------- diff --git a/docs/latest/mahout-samsara/in-core-reference.html b/docs/latest/mahout-samsara/in-core-reference.html index 2fc8671..509ecef 100644 --- a/docs/latest/mahout-samsara/in-core-reference.html +++ b/docs/latest/mahout-samsara/in-core-reference.html @@ -414,7 +414,7 @@ a !== b </code></pre> </div> -<p><em>note: Transposition is currently handled via view, i.e. updating a transposed matrix will be updating the original.</em> Also computing something like <code class="highlighter-rouge">\(\mathbf{X^\top}\mathbf{X}\)</code>:</p> +<p><em>note: Transposition is currently handled via view, i.e. updating a transposed matrix will be updating the original.</em> Also computing something like <code class="highlighter-rouge">$$\mathbf{X^\top}\mathbf{X}$$</code>:</p> <div class="highlighter-rouge"><pre class="highlight"><code>val XtX = X.t %*% X </code></pre> @@ -470,19 +470,19 @@ a !== b <p><strong>Solving linear equation systems and matrix inversion:</strong> fully similar to R semantics; there are three forms of invocation:</p> -<p>Solve <code class="highlighter-rouge">\(\mathbf{AX}=\mathbf{B}\)</code>:</p> +<p>Solve <code class="highlighter-rouge">$$\mathbf{AX}=\mathbf{B}$$</code>:</p> <div class="highlighter-rouge"><pre class="highlight"><code>solve(A, B) </code></pre> </div> -<p>Solve <code class="highlighter-rouge">\(\mathbf{Ax}=\mathbf{b}\)</code>:</p> +<p>Solve <code class="highlighter-rouge">$$\mathbf{Ax}=\mathbf{b}$$</code>:</p> <div class="highlighter-rouge"><pre class="highlight"><code>solve(A, b) </code></pre> </div> -<p>Compute <code class="highlighter-rouge">\(\mathbf{A^{-1}}\)</code>:</p> +<p>Compute <code class="highlighter-rouge">$$\mathbf{A^{-1}}$$</code>:</p> <div class="highlighter-rouge"><pre class="highlight"><code>solve(A) </code></pre> @@ -520,19 +520,19 @@ m.rowMeans <h4 id="random-matrices">Random Matrices</h4> -<p><code class="highlighter-rouge">\(\mathcal{U}\)</code>(0,1) random matrix view:</p> +<p><code class="highlighter-rouge">$$\mathcal{U}$$</code>(0,1) random matrix view:</p> <div class="highlighter-rouge"><pre class="highlight"><code>val incCoreA = Matrices.uniformView(m, n, seed) </code></pre> </div> -<p><code class="highlighter-rouge">\(\mathcal{U}\)</code>(-1,1) random matrix view:</p> +<p><code class="highlighter-rouge">$$\mathcal{U}$$</code>(-1,1) random matrix view:</p> <div class="highlighter-rouge"><pre class="highlight"><code>val incCoreA = Matrices.symmetricUniformView(m, n, seed) </code></pre> </div> -<p><code class="highlighter-rouge">\(\mathcal{N}\)</code>(-1,1) random matrix view:</p> +<p><code class="highlighter-rouge">$$\mathcal{N}$$</code>(-1,1) random matrix view:</p> <div class="highlighter-rouge"><pre class="highlight"><code>val incCoreA = Matrices.gaussianView(m, n, seed) </code></pre> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/mahout-samsara/out-of-core-reference.html ---------------------------------------------------------------------- diff --git a/docs/latest/mahout-samsara/out-of-core-reference.html b/docs/latest/mahout-samsara/out-of-core-reference.html index d1b3908..1b33383 100644 --- a/docs/latest/mahout-samsara/out-of-core-reference.html +++ b/docs/latest/mahout-samsara/out-of-core-reference.html @@ -324,7 +324,7 @@ inCoreA /: B <p><strong>Matrix-matrix multiplication %*%</strong>:</p> -<p><code class="highlighter-rouge">\(\mathbf{M}=\mathbf{AB}\)</code></p> +<p><code class="highlighter-rouge">$$\mathbf{M}=\mathbf{AB}$$</code></p> <div class="highlighter-rouge"><pre class="highlight"><code>A %*% B A %*% inCoreB @@ -336,7 +336,7 @@ A %*%: B <p><em>Note: same as above, whenever operator arguments include both in-core and out-of-core arguments, the operator can only be associated with the out-of-core (DRM) argument to support the distributed implementation.</em></p> <p><strong>Matrix-vector multiplication %*%</strong> -Currently we support a right multiply product of a DRM and an in-core Vector(<code class="highlighter-rouge">\(\mathbf{Ax}\)</code>) resulting in a single column DRM, which then can be collected in front (usually the desired outcome):</p> +Currently we support a right multiply product of a DRM and an in-core Vector(<code class="highlighter-rouge">$$\mathbf{Ax}$$</code>) resulting in a single column DRM, which then can be collected in front (usually the desired outcome):</p> <div class="highlighter-rouge"><pre class="highlight"><code>val Ax = A %*% x val inCoreX = Ax.collect(::, 0) @@ -356,7 +356,7 @@ A / 5.0 </code></pre> </div> -<p>Note that <code class="highlighter-rouge">5.0 -: A</code> means <code class="highlighter-rouge">\(m_{ij} = 5 - a_{ij}\)</code> and <code class="highlighter-rouge">5.0 /: A</code> means <code class="highlighter-rouge">\(m_{ij} = \frac{5}{a{ij}}\)</code> for all elements of the result.</p> +<p>Note that <code class="highlighter-rouge">5.0 -: A</code> means <code class="highlighter-rouge">$$m_{ij} = 5 - a_{ij}$$</code> and <code class="highlighter-rouge">5.0 /: A</code> means <code class="highlighter-rouge">$$m_{ij} = \frac{5}{a{ij}}$$</code> for all elements of the result.</p> <h4 id="slicing">Slicing</h4> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/tutorials/samsara/play-with-shell.html ---------------------------------------------------------------------- diff --git a/docs/latest/tutorials/samsara/play-with-shell.html b/docs/latest/tutorials/samsara/play-with-shell.html index 2574abf..fbe89ff 100644 --- a/docs/latest/tutorials/samsara/play-with-shell.html +++ b/docs/latest/tutorials/samsara/play-with-shell.html @@ -314,15 +314,15 @@ val drmData = drmParallelize(dense( <p>Have a look at this matrix. The first four columns represent the ingredients (our features) and the last column (the rating) is the target variable for our regression. <a href="https://en.wikipedia.org/wiki/Linear_regression">Linear regression</a> -assumes that the <strong>target variable</strong> <code class="highlighter-rouge">\(\mathbf{y}\)</code> is generated by the -linear combination of <strong>the feature matrix</strong> <code class="highlighter-rouge">\(\mathbf{X}\)</code> with the -<strong>parameter vector</strong> <code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code> plus the - <strong>noise</strong> <code class="highlighter-rouge">\(\boldsymbol{\varepsilon}\)</code>, summarized in the formula -<code class="highlighter-rouge">\(\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}\)</code>. +assumes that the <strong>target variable</strong> <code class="highlighter-rouge">$$\mathbf{y}$$</code> is generated by the +linear combination of <strong>the feature matrix</strong> <code class="highlighter-rouge">$$\mathbf{X}$$</code> with the +<strong>parameter vector</strong> <code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code> plus the + <strong>noise</strong> <code class="highlighter-rouge">$$\boldsymbol{\varepsilon}$$</code>, summarized in the formula +<code class="highlighter-rouge">$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\varepsilon}$$</code>. Our goal is to find an estimate of the parameter vector -<code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code> that explains the data very well.</p> +<code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code> that explains the data very well.</p> -<p>As a first step, we extract <code class="highlighter-rouge">\(\mathbf{X}\)</code> and <code class="highlighter-rouge">\(\mathbf{y}\)</code> from our data matrix. We get <em>X</em> by slicing: we take all rows (denoted by <code class="highlighter-rouge">::</code>) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. <strong>Mahoutâs DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.</strong></p> +<p>As a first step, we extract <code class="highlighter-rouge">$$\mathbf{X}$$</code> and <code class="highlighter-rouge">$$\mathbf{y}$$</code> from our data matrix. We get <em>X</em> by slicing: we take all rows (denoted by <code class="highlighter-rouge">::</code>) and the first four columns, which have the ingredients in milligrams as content. Note that the result is again a DRM. The shell will not execute this code yet, it saves the history of operations and defers the execution until we really access a result. <strong>Mahoutâs DSL automatically optimizes and parallelizes all operations on DRMs and runs them on Apache Spark.</strong></p> <div class="codehilite"><pre> val drmX = drmData(::, 0 until 4) @@ -334,27 +334,27 @@ val drmX = drmData(::, 0 until 4) val y = drmData.collect(::, 4) </pre></div> -<p>Now we are ready to think about a mathematical way to estimate the parameter vector <em>β</em>. A simple textbook approach is <a href="https://en.wikipedia.org/wiki/Ordinary_least_squares">ordinary least squares (OLS)</a>, which minimizes the sum of residual squares between the true target variable and the prediction of the target variable. In OLS, there is even a closed form expression for estimating <code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code> as -<code class="highlighter-rouge">\(\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{X}^{\top}\mathbf{y}\)</code>.</p> +<p>Now we are ready to think about a mathematical way to estimate the parameter vector <em>β</em>. A simple textbook approach is <a href="https://en.wikipedia.org/wiki/Ordinary_least_squares">ordinary least squares (OLS)</a>, which minimizes the sum of residual squares between the true target variable and the prediction of the target variable. In OLS, there is even a closed form expression for estimating <code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code> as +<code class="highlighter-rouge">$$\left(\mathbf{X}^{\top}\mathbf{X}\right)^{-1}\mathbf{X}^{\top}\mathbf{y}$$</code>.</p> -<p>The first thing which we compute for this is <code class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{X}\)</code>. The code for doing this in Mahoutâs scala DSL maps directly to the mathematical formula. The operation <code class="highlighter-rouge">.t()</code> transposes a matrix and analogous to R <code class="highlighter-rouge">%*%</code> denotes matrix multiplication.</p> +<p>The first thing which we compute for this is <code class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{X}$$</code>. The code for doing this in Mahoutâs scala DSL maps directly to the mathematical formula. The operation <code class="highlighter-rouge">.t()</code> transposes a matrix and analogous to R <code class="highlighter-rouge">%*%</code> denotes matrix multiplication.</p> <div class="codehilite"><pre> val drmXtX = drmX.t %*% drmX </pre></div> -<p>The same is true for computing <code class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{y}\)</code>. We can simply type the math in scala expressions into the shell. Here, <em>X</em> lives in the cluster, while is <em>y</em> in the memory of the driver, and the result is a DRM again.</p> +<p>The same is true for computing <code class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{y}$$</code>. We can simply type the math in scala expressions into the shell. Here, <em>X</em> lives in the cluster, while is <em>y</em> in the memory of the driver, and the result is a DRM again.</p> <div class="codehilite"><pre> val drmXty = drmX.t %*% y </pre></div> -<p>Weâre nearly done. The next step we take is to fetch <code class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{X}\)</code> and -<code class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{y}\)</code> into the memory of our driver machine (we are targeting +<p>Weâre nearly done. The next step we take is to fetch <code class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{X}$$</code> and +<code class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{y}$$</code> into the memory of our driver machine (we are targeting features matrices that are tall and skinny , -so we can assume that <code class="highlighter-rouge">\(\mathbf{X}^{\top}\mathbf{X}\)</code> is small enough +so we can assume that <code class="highlighter-rouge">$$\mathbf{X}^{\top}\mathbf{X}$$</code> is small enough to fit in). Then, we provide them to an in-memory solver (Mahout provides the an analog to Râs <code class="highlighter-rouge">solve()</code> for that) which computes <code class="highlighter-rouge">beta</code>, our -OLS estimate of the parameter vector <code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code>.</p> +OLS estimate of the parameter vector <code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code>.</p> <div class="codehilite"><pre> val XtX = drmXtX.collect @@ -371,9 +371,9 @@ as much as possible, while still retaining decent performance and scalability.</p> <p>We can now check how well our model fits its training data. -First, we multiply the feature matrix <code class="highlighter-rouge">\(\mathbf{X}\)</code> by our estimate of -<code class="highlighter-rouge">\(\boldsymbol{\beta}\)</code>. Then, we look at the difference (via L2-norm) of -the target variable <code class="highlighter-rouge">\(\mathbf{y}\)</code> to the fitted target variable:</p> +First, we multiply the feature matrix <code class="highlighter-rouge">$$\mathbf{X}$$</code> by our estimate of +<code class="highlighter-rouge">$$\boldsymbol{\beta}$$</code>. Then, we look at the difference (via L2-norm) of +the target variable <code class="highlighter-rouge">$$\mathbf{y}$$</code> to the fitted target variable:</p> <div class="codehilite"><pre> val yFitted = (drmX %*% beta).collect(::, 0) @@ -406,7 +406,7 @@ def goodnessOfFit(drmX: DrmLike[Int], beta: Vector, y: Vector) = { model. Usually there is a constant bias term added to the model. Without that, our model always crosses through the origin and we only learn the right angle. An easy way to add such a bias term to our model is to add a -column of ones to the feature matrix <code class="highlighter-rouge">\(\mathbf{X}\)</code>. +column of ones to the feature matrix <code class="highlighter-rouge">$$\mathbf{X}$$</code>. The corresponding weight in the parameter vector will then be the bias term.</p> <p>Here is how we add a bias column:</p> http://git-wip-us.apache.org/repos/asf/mahout/blob/bdaf56d2/docs/latest/tutorials/samsara/spark-naive-bayes.html ---------------------------------------------------------------------- diff --git a/docs/latest/tutorials/samsara/spark-naive-bayes.html b/docs/latest/tutorials/samsara/spark-naive-bayes.html index dfa8a6d..b0b4819 100644 --- a/docs/latest/tutorials/samsara/spark-naive-bayes.html +++ b/docs/latest/tutorials/samsara/spark-naive-bayes.html @@ -181,38 +181,38 @@ <p>As described in <a href="http://people.csail.mit.edu/jrennie/papers/icml03-nb.pdf">[1]</a> Mahout Naive Bayes is broken down into the following steps (assignments are over all possible index values):</p> <ul> - <li>Let <code class="highlighter-rouge">\(\vec{d}=(\vec{d_1},...,\vec{d_n})\)</code> be a set of documents; <code class="highlighter-rouge">\(d_{ij}\)</code> is the count of word <code class="highlighter-rouge">\(i\)</code> in document <code class="highlighter-rouge">\(j\)</code>.</li> - <li>Let <code class="highlighter-rouge">\(\vec{y}=(y_1,...,y_n)\)</code> be their labels.</li> - <li>Let <code class="highlighter-rouge">\(\alpha_i\)</code> be a smoothing parameter for all words in the vocabulary; let <code class="highlighter-rouge">\(\alpha=\sum_i{\alpha_i}\)</code>.</li> - <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and L2 length normalization of <code class="highlighter-rouge">\(\vec{d}\)</code> + <li>Let <code class="highlighter-rouge">$$\vec{d}=(\vec{d_1},...,\vec{d_n})$$</code> be a set of documents; <code class="highlighter-rouge">$$d_{ij}$$</code> is the count of word <code class="highlighter-rouge">$$i$$</code> in document <code class="highlighter-rouge">$$j$$</code>.</li> + <li>Let <code class="highlighter-rouge">$$\vec{y}=(y_1,...,y_n)$$</code> be their labels.</li> + <li>Let <code class="highlighter-rouge">$$\alpha_i$$</code> be a smoothing parameter for all words in the vocabulary; let <code class="highlighter-rouge">$$\alpha=\sum_i{\alpha_i}$$</code>.</li> + <li><strong>Preprocessing</strong>(via seq2Sparse) TF-IDF transformation and L2 length normalization of <code class="highlighter-rouge">$$\vec{d}$$</code> <ol> - <li><code class="highlighter-rouge">\(d_{ij} = \sqrt{d_{ij}}\)</code></li> - <li><code class="highlighter-rouge">\(d_{ij} = d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)\)</code></li> - <li><code class="highlighter-rouge">\(d_{ij} =\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}\)</code></li> + <li><code class="highlighter-rouge">$$d_{ij} = \sqrt{d_{ij}}$$</code></li> + <li><code class="highlighter-rouge">$$d_{ij} = d_{ij}\left(\log{\frac{\sum_k1}{\sum_k\delta_{ik}+1}}+1\right)$$</code></li> + <li><code class="highlighter-rouge">$$d_{ij} =\frac{d_{ij}}{\sqrt{\sum_k{d_{kj}^2}}}$$</code></li> </ol> </li> - <li><strong>Training: Bayes</strong><code class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights <code class="highlighter-rouge">\(w_{ci}\)</code> as: + <li><strong>Training: Bayes</strong><code class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights <code class="highlighter-rouge">$$w_{ci}$$</code> as: <ol> - <li><code class="highlighter-rouge">\(\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}\)</code></li> - <li><code class="highlighter-rouge">\(w_{ci}=\log{\hat\theta_{ci}}\)</code></li> + <li><code class="highlighter-rouge">$$\hat\theta_{ci}=\frac{d_{ic}+\alpha_i}{\sum_k{d_{kc}}+\alpha}$$</code></li> + <li><code class="highlighter-rouge">$$w_{ci}=\log{\hat\theta_{ci}}$$</code></li> </ol> </li> - <li><strong>Training: CBayes</strong><code class="highlighter-rouge">\((\vec{d},\vec{y})\)</code> calculate term weights <code class="highlighter-rouge">\(w_{ci}\)</code> as: + <li><strong>Training: CBayes</strong><code class="highlighter-rouge">$$(\vec{d},\vec{y})$$</code> calculate term weights <code class="highlighter-rouge">$$w_{ci}$$</code> as: <ol> - <li><code class="highlighter-rouge">\(\hat\theta_{ci} = \frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq c}{\sum_k{d_{kj}}}+\alpha}\)</code></li> - <li><code class="highlighter-rouge">\(w_{ci}=-\log{\hat\theta_{ci}}\)</code></li> - <li><code class="highlighter-rouge">\(w_{ci}=\frac{w_{ci}}{\sum_i \lvert w_{ci}\rvert}\)</code></li> + <li><code class="highlighter-rouge">$$\hat\theta_{ci} = \frac{\sum_{j:y_j\neq c}d_{ij}+\alpha_i}{\sum_{j:y_j\neq c}{\sum_k{d_{kj}}}+\alpha}$$</code></li> + <li><code class="highlighter-rouge">$$w_{ci}=-\log{\hat\theta_{ci}}$$</code></li> + <li><code class="highlighter-rouge">$$w_{ci}=\frac{w_{ci}}{\sum_i \lvert w_{ci}\rvert}$$</code></li> </ol> </li> <li><strong>Label Assignment/Testing:</strong> <ol> - <li>Let <code class="highlighter-rouge">\(\vec{t}= (t_1,...,t_n)\)</code> be a test document; let <code class="highlighter-rouge">\(t_i\)</code> be the count of the word <code class="highlighter-rouge">\(t\)</code>.</li> - <li>Label the document according to <code class="highlighter-rouge">\(l(t)=\arg\max_c \sum\limits_{i} t_i w_{ci}\)</code></li> + <li>Let <code class="highlighter-rouge">$$\vec{t}= (t_1,...,t_n)$$</code> be a test document; let <code class="highlighter-rouge">$$t_i$$</code> be the count of the word <code class="highlighter-rouge">$$t$$</code>.</li> + <li>Label the document according to <code class="highlighter-rouge">$$l(t)=\arg\max_c \sum\limits_{i} t_i w_{ci}$$</code></li> </ol> </li> </ul> -<p>As we can see, the main difference between Bayes and CBayes is the weight calculation step. Where Bayes weighs terms more heavily based on the likelihood that they belong to class <code class="highlighter-rouge">\(c\)</code>, CBayes seeks to maximize term weights on the likelihood that they do not belong to any other class.</p> +<p>As we can see, the main difference between Bayes and CBayes is the weight calculation step. Where Bayes weighs terms more heavily based on the likelihood that they belong to class <code class="highlighter-rouge">$$c$$</code>, CBayes seeks to maximize term weights on the likelihood that they do not belong to any other class.</p> <h2 id="running-from-the-command-line">Running from the command line</h2>
