Author: pat
Date: Fri Sep 5 15:10:17 2014
New Revision: 1622718
URL: http://svn.apache.org/r1622718
Log:
replace left angle brace in non-code blocks
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1622718&r1=1622717&r2=1622718&view=diff
==============================================================================
---
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
(original)
+++
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Fri Sep 5 15:10:17 2014
@@ -214,9 +214,14 @@ Can be parsed with the following CLI and
##2. spark-rowsimilarity
-*spark-rowsimilarity* is the companion to *spark-itemsimilarity* the primary
difference is that it takes a text file version of a DRM with optional
application specific IDs. The input is in text-delimited form where there are
three delimiters used. By default it reads
(rowID<tab>columnID1:strength1<space>columnID2:strength2...) Since this job
only supports LLR similarity, which does not use the input strengths, they may
be omitted in the input. It writes
(columnID<tab>columnID1:strength1<space>columnID2:strength2...) The output is
sorted by strength descending. The output can be interpreted as a column id
from the primary input followed by a list of the most similar columns. For a
discussion of the output layout and formatting see *spark-itemsimilarity*.
-
-One significant output option is --omitStrength. This allows output of the
form (columnID<tab>columnID2<space>columnID2<space>...) This is a tab-delimited
file containing a columnID token followed by a space delimited string of
tokens. It can be directly indexed by search engines to create an item-based
recommender.
+*spark-rowsimilarity* is the companion to *spark-itemsimilarity* the primary
difference is that it takes a text file version of
+a matrix of sparse vectors with optional application specific IDs and it finds
similar rows rather than items (columns). Its use is
+not limited to collaborative filtering. The input is in text-delimited form
where there are three delimiters used. By
+default it reads
(rowID<tab>columnID1:strength1<space>columnID2:strength2...) Since this
job only supports LLR similarity,
+ which does not use the input strengths, they may be omitted in the input. It
writes
+(rowID<tab>rowID1:strength1<space>rowID2:strength2...)
+The output is sorted by strength descending. The output can be interpreted as
a row ID from the primary input followed
+by a list of the most similar rows.
The command line interface is:
@@ -311,4 +316,3 @@ Another use case for these jobs is in fi
-