Author: pat
Date: Thu Sep 4 14:56:38 2014
New Revision: 1622492
URL: http://svn.apache.org/r1622492
Log:
updating cli help sessage
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Modified:
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
URL:
http://svn.apache.org/viewvc/mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext?rev=1622492&r1=1622491&r2=1622492&view=diff
==============================================================================
---
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
(original)
+++
mahout/site/mahout_cms/trunk/content/users/recommender/intro-cooccurrence-spark.mdtext
Thu Sep 4 14:56:38 2014
@@ -21,16 +21,15 @@ cross-cooccurrence is a more principled
to recommend.
- spark-itemsimilarity Mahout 1.0-SNAPSHOT
+ spark-itemsimilarity Mahout 1.0
Usage: spark-itemsimilarity [options]
+ Disconnected from the target VM, address: '127.0.0.1:64676', transport:
'socket'
Input, output options
-i <value> | --input <value>
- Input path, may be a filename, directory name, or comma delimited
list of
- HDFS supported URIs (required)
+ Input path, may be a filename, directory name, or comma delimited
list of HDFS supported URIs (required)
-i2 <value> | --input2 <value>
- Secondary input path for cross-similarity calculation, same
restrictions
- as "--input" (optional). Default: empty.
+ Secondary input path for cross-similarity calculation, same
restrictions as "--input" (optional). Default: empty.
-o <value> | --output <value>
Path for output, any local or HDFS supported URI (required)
@@ -38,8 +37,7 @@ to recommend.
-mppu <value> | --maxPrefs <value>
Max number of preferences to consider per user (optional).
Default: 500
-m <value> | --maxSimilaritiesPerItem <value>
- Limit the number of similarities per item to this number
(optional).
- Default: 100
+ Limit the number of similarities per item to this number
(optional). Default: 100
Note: Only the Log Likelihood Ratio (LLR) is supported as a similarity
measure.
@@ -47,56 +45,42 @@ to recommend.
-id <value> | --inDelim <value>
Input delimiter character (optional). Default: "[,\t]"
-f1 <value> | --filter1 <value>
- String (or regex) whose presence indicates a datum for the primary
item
- set (optional). Default: no filter, all data is used
+ String (or regex) whose presence indicates a datum for the primary
item set (optional). Default: no filter, all data is used
-f2 <value> | --filter2 <value>
- String (or regex) whose presence indicates a datum for the
secondary item
- set (optional). If not present no secondary dataset is collected
- -rc <value> | --rowIDPosition <value>
- Column number (0 based Int) containing the row ID string
(optional).
- Default: 0
- -ic <value> | --itemIDPosition <value>
- Column number (0 based Int) containing the item ID string
(optional).
- Default: 1
- -fc <value> | --filterPosition <value>
- Column number (0 based Int) containing the filter string
(optional).
- Default: -1 for no filter
+ String (or regex) whose presence indicates a datum for the
secondary item set (optional). If not present no secondary dataset is collected
+ -rc <value> | --rowIDColumn <value>
+ Column number (0 based Int) containing the row ID string
(optional). Default: 0
+ -ic <value> | --itemIDColumn <value>
+ Column number (0 based Int) containing the item ID string
(optional). Default: 1
+ -fc <value> | --filterColumn <value>
+ Column number (0 based Int) containing the filter string
(optional). Default: -1 for no filter
Using all defaults the input is expected of the form: "userID<tab>itemId"
or "userID<tab>itemID<tab>any-text..." and all rows will be used
File discovery options:
-r | --recursive
- Searched the -i path recursively for files that match
--filenamePattern
- (optional), default: false
+ Searched the -i path recursively for files that match
--filenamePattern (optional), Default: false
-fp <value> | --filenamePattern <value>
- Regex to match in determining input files (optional). Default:
filename
- in the --input option or "^part-.*" if --input is a directory
+ Regex to match in determining input files (optional). Default:
filename in the --input option or "^part-.*" if --input is a directory
Output text file schema options:
-rd <value> | --rowKeyDelim <value>
- Separates the rowID key from the vector values list (optional).
Default:
- \t"
+ Separates the rowID key from the vector values list (optional).
Default: "\t"
-cd <value> | --columnIdStrengthDelim <value>
- Separates column IDs from their values in the vector values list
(optional).
- Default: ":"
+ Separates column IDs from their values in the vector values list
(optional). Default: ":"
-td <value> | --elementDelim <value>
Separates vector element values in the values list (optional).
Default: " "
-os | --omitStrength
Do not write the strength to the output files (optional), Default:
false.
- This option is used to output indexable data for creating a search
engine
- recommender.
+ This option is used to output indexable data for creating a search engine
recommender.
Default delimiters will produce output of the form:
"itemID1<tab>itemID2:value2<space>itemID10:value10..."
Spark config options:
-ma <value> | --master <value>
- Spark Master URL (optional). Default: "local". Note that you can
specify
- the number of cores to get a performance improvement, for example
"local[4]"
+ Spark Master URL (optional). Default: "local". Note that you can
specify the number of cores to get a performance improvement, for example
"local[4]"
-sem <value> | --sparkExecutorMem <value>
- Max Java heap available as "executor memory" on each node
(optional).
- Default: 4g
-
- General config options:
+ Max Java heap available as "executor memory" on each node
(optional). Default: 4g
-rs <value> | --randomSeed <value>
-h | --help
@@ -236,61 +220,48 @@ One significant output option is --omitS
The command line interface is:
- spark-rowsimilarity Mahout 1.0-SNAPSHOT
+ spark-rowsimilarity Mahout 1.0
Usage: spark-rowsimilarity [options]
Input, output options
-i <value> | --input <value>
- Input path, may be a filename, directory name, or comma delimited
list
- of HDFS supported URIs (required)
- -o <value> | --output <value>
+ Input path, may be a filename, directory name, or comma delimited
list of HDFS supported URIs (required)
+ -o <value> | --output <value>
Path for output, any local or HDFS supported URI (required)
Algorithm control options:
-mo <value> | --maxObservations <value>
Max number of observations to consider per row (optional).
Default: 500
-m <value> | --maxSimilaritiesPerRow <value>
- Limit the number of similarities per item to this number
(optional).
- Default: 100
+ Limit the number of similarities per item to this number
(optional). Default: 100
Note: Only the Log Likelihood Ratio (LLR) is supported as a similarity
measure.
+ Disconnected from the target VM, address: '127.0.0.1:49162', transport:
'socket'
Output text file schema options:
-rd <value> | --rowKeyDelim <value>
- Separates the rowID key from the vector values list (optional).
- Default: "\t"
+ Separates the rowID key from the vector values list (optional).
Default: "\t"
-cd <value> | --columnIdStrengthDelim <value>
- Separates column IDs from their values in the vector values list
- (optional). Default: ":"
+ Separates column IDs from their values in the vector values list
(optional). Default: ":"
-td <value> | --elementDelim <value>
- Separates vector element values in the values list (optional).
- Default: " "
+ Separates vector element values in the values list (optional).
Default: " "
-os | --omitStrength
- Do not write the strength to the output files (optional), Default:
- false.
- This option is used to output indexable data for creating a search engine
- recommender.
+ Do not write the strength to the output files (optional), Default:
false.
+ This option is used to output indexable data for creating a search engine
recommender.
Default delimiters will produce output of the form:
"itemID1<tab>itemID2:value2<space>itemID10:value10..."
File discovery options:
-r | --recursive
- Searched the -i path recursively for files that match
- --filenamePattern (optional), Default: false
+ Searched the -i path recursively for files that match
--filenamePattern (optional), Default: false
-fp <value> | --filenamePattern <value>
- Regex to match in determining input files (optional). Default:
- filename in the --input option or "^part-.*" if --input is a
directory
+ Regex to match in determining input files (optional). Default:
filename in the --input option or "^part-.*" if --input is a directory
Spark config options:
-ma <value> | --master <value>
- Spark Master URL (optional). Default: "local". Note that you can
- specify the number of cores to get a performance improvement, for
- example "local[4]"
+ Spark Master URL (optional). Default: "local". Note that you can
specify the number of cores to get a performance improvement, for example
"local[4]"
-sem <value> | --sparkExecutorMem <value>
- Max Java heap available as "executor memory" on each node
(optional).
- Default: 4g
-
- General config options:
+ Max Java heap available as "executor memory" on each node
(optional). Default: 4g
-rs <value> | --randomSeed <value>
-h | --help