intro-cooccurrence-spark.html

buildbot Fri, 05 Sep 2014 08:11:34 -0700

Author: buildbot
Date: Fri Sep  5 15:10:21 2014
New Revision: 921333

Log:
Staging update by buildbot for mahout


Modified:
    websites/staging/mahout/trunk/content/   (props changed)
    
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html

Propchange: websites/staging/mahout/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Fri Sep  5 15:10:21 2014
@@ -1 +1 @@
-1622492
+1622718

Modified: 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
==============================================================================
--- 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 (original)
+++ 
websites/staging/mahout/trunk/content/users/recommender/intro-cooccurrence-spark.html
 Fri Sep  5 15:10:21 2014
@@ -453,8 +453,14 @@ to recommend.   </p>
 
 
 <h2 id="2-spark-rowsimilarity">2. spark-rowsimilarity</h2>
-<p><em>spark-rowsimilarity</em> is the companion to 
<em>spark-itemsimilarity</em> the primary difference is that it takes a text 
file version of a DRM with optional application specific IDs. The input is in 
text-delimited form where there are three delimiters used. By default it reads 
(rowID<tab>columnID1:strength1<space>columnID2:strength2...) Since this job 
only supports LLR similarity, which does not use the input strengths, they may 
be omitted in the input. It writes 
(columnID<tab>columnID1:strength1<space>columnID2:strength2...) The output is 
sorted by strength descending. The output can be interpreted as a column id 
from the primary input followed by a list of the most similar columns. For a 
discussion of the output layout and formatting see 
<em>spark-itemsimilarity</em>. </p>
-<p>One significant output option is --omitStrength. This allows output of the 
form (columnID<tab>columnID2<space>columnID2<space>...) This is a tab-delimited 
file containing a columnID token followed by a space delimited string of 
tokens. It can be directly indexed by search engines to create an item-based 
recommender.</p>
+<p><em>spark-rowsimilarity</em> is the companion to 
<em>spark-itemsimilarity</em> the primary difference is that it takes a text 
file version of 
+a matrix of sparse vectors with optional application specific IDs and it finds 
similar rows rather than items (columns). Its use is
+not limited to collaborative filtering. The input is in text-delimited form 
where there are three delimiters used. By 
+default it reads 
(rowID&lt;tab&gt;columnID1:strength1&lt;space&gt;columnID2:strength2...) Since 
this job only supports LLR similarity,
+ which does not use the input strengths, they may be omitted in the input. It 
writes 
+(rowID&lt;tab&gt;rowID1:strength1&lt;space&gt;rowID2:strength2...) 
+The output is sorted by strength descending. The output can be interpreted as 
a row ID from the primary input followed 
+by a list of the most similar rows.</p>
 <p>The command line interface is:</p>
 <div class="codehilite"><pre><span class="n">spark</span><span 
class="o">-</span><span class="n">rowsimilarity</span> <span 
class="n">Mahout</span> 1<span class="p">.</span>0
 <span class="n">Usage</span><span class="p">:</span> <span 
class="n">spark</span><span class="o">-</span><span 
class="n">rowsimilarity</span> <span class="p">[</span><span 
class="n">options</span><span class="p">]</span>

svn commit: r921333 - in /websites/staging/mahout/trunk/content: ./ users/recommender/intro-cooccurrence-spark.html

Reply via email to