Author: buildbot
Date: Fri Sep 5 15:10:21 2014
New Revision: 921333
Log:
Staging update by buildbot for mahout
Modified:
websites/staging/mahout/trunk/content/ (props changed)
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Sep 5 15:10:21 2014
@@ -1 +1 @@
-1622492
+1622718
Modified:
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
---
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
(original)
+++
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
Fri Sep 5 15:10:21 2014
@@ -453,8 +453,14 @@ to recommend. </p>
<h2 id="2-spark-rowsimilarity">2. spark-rowsimilarity</h2>
-<p><em>spark-rowsimilarity</em> is the companion to
<em>spark-itemsimilarity</em> the primary difference is that it takes a text
file version of a DRM with optional application specific IDs. The input is in
text-delimited form where there are three delimiters used. By default it reads
(rowID<tab>columnID1:strength1<space>columnID2:strength2...) Since this job
only supports LLR similarity, which does not use the input strengths, they may
be omitted in the input. It writes
(columnID<tab>columnID1:strength1<space>columnID2:strength2...) The output is
sorted by strength descending. The output can be interpreted as a column id
from the primary input followed by a list of the most similar columns. For a
discussion of the output layout and formatting see
<em>spark-itemsimilarity</em>. </p>
-<p>One significant output option is --omitStrength. This allows output of the
form (columnID<tab>columnID2<space>columnID2<space>...) This is a tab-delimited
file containing a columnID token followed by a space delimited string of
tokens. It can be directly indexed by search engines to create an item-based
recommender.</p>
+<p><em>spark-rowsimilarity</em> is the companion to
<em>spark-itemsimilarity</em> the primary difference is that it takes a text
file version of
+a matrix of sparse vectors with optional application specific IDs and it finds
similar rows rather than items (columns). Its use is
+not limited to collaborative filtering. The input is in text-delimited form
where there are three delimiters used. By
+default it reads
(rowID<tab>columnID1:strength1<space>columnID2:strength2...) Since
this job only supports LLR similarity,
+ which does not use the input strengths, they may be omitted in the input. It
writes
+(rowID<tab>rowID1:strength1<space>rowID2:strength2...)
+The output is sorted by strength descending. The output can be interpreted as
a row ID from the primary input followed
+by a list of the most similar rows.</p>
<p>The command line interface is:</p>
<div class="codehilite"><pre><span class="n">spark</span><span
class="o">-</span><span class="n">rowsimilarity</span> <span
class="n">Mahout</span> 1<span class="p">.</span>0
<span class="n">Usage</span><span class="p">:</span> <span
class="n">spark</span><span class="o">-</span><span
class="n">rowsimilarity</span> <span class="p">[</span><span
class="n">options</span><span class="p">]</span>