Looks like it was just changed to -i (--input), likely for uniformity
with other CLI operations. The documentation needs to be updated.
On 2/16/12 10:31 AM, Tharindu Mathew wrote:
Hi,
I'm trying out the synthetic control example and noticed the cluster dumper
command located at [1] does not work.
Appreciate if anyone can correct my command... seems --seqFileDir is
deprecated... I tried a few more combinations of commands and it failed to
work.
[1] - https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper
Files in HDFS at output:
$ bin/hadoop fs -lsr output
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:32
/user/mackie/output/clusteredPoints
-rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:32
/user/mackie/output/clusteredPoints/_SUCCESS
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history
-rw-r--r-- 1 mackie supergroup 7105 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_1329408095893_mackie_Canopy+Driver+running+clusterData+over+input%3A+outp
-rw-r--r-- 1 mackie supergroup 20634 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_conf.xml
-rw-r--r-- 1 mackie supergroup 340891 2012-02-16 21:31
/user/mackie/output/clusteredPoints/part-m-00000
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusters-0-final
-rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusters-0-final/_SUCCESS
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history
-rw-r--r-- 1 mackie supergroup 10696 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_1329408047297_mackie_Canopy+Driver+running+buildClusters+over+input%3A+ou
-rw-r--r-- 1 mackie supergroup 20920 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_conf.xml
-rw-r--r-- 1 mackie supergroup 6747 2012-02-16 21:31
/user/mackie/output/clusters-0-final/part-r-00000
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data
-rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data/_SUCCESS
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data/_logs
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data/_logs/history
-rw-r--r-- 1 mackie supergroup 7063 2012-02-16 21:30
/user/mackie/output/data/_logs/history/job_201202162112_0003_1329408010408_mackie_Input+Driver+running+over+input%3A+testdata
-rw-r--r-- 1 mackie supergroup 19845 2012-02-16 21:30
/user/mackie/output/data/_logs/history/job_201202162112_0003_conf.xml
-rw-r--r-- 1 mackie supergroup 335470 2012-02-16 21:30
/user/mackie/output/data/part-m-00000
Here's my output:
$ $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10
--pointsDir output/clusteredPoints --output
$MAHOUT_HOME/examples/output/clusteranalyze.txt
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
HADOOP_HOME=/Users/mackie/devtools/hadoop-0.20.204.0
No HADOOP_CONF_DIR set, using /Users/mackie/devtools/hadoop-0.20.204.0/conf
MAHOUT-JOB:
/Users/mackie/source-checkouts/mahout-trunk/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar
12/02/16 22:50:31 ERROR common.AbstractJob: Unexpected --seqFileDir while
processing Job-Specific Options:
usage:<command> [Generic Options] [Job-Specific Options]
Generic Options:
-archives<paths> comma separated archives to be unarchived
on the compute machines.
-conf<configuration file> specify an application configuration file
-D<property=value> use value for given property
-files<paths> comma separated files to be copied to the
map reduce cluster
-fs<local|namenode:port> specify a namenode
-jt<local|jobtracker:port> specify a job tracker
-libjars<paths> comma separated jar files to include in
the classpath.
-tokenCacheFile<tokensFile> name of the file with the tokens
Unexpected --seqFileDir while processing Job-Specific
Options:
Usage:
[--input<input> --output<output> --outputFormat<outputFormat>
--substring
<substring> --numWords<numWords> --pointsDir<pointsDir>
--samplePoints
<samplePoints> --dictionary<dictionary> --dictionaryType
<dictionaryType>
--evaluate --distanceMeasure<distanceMeasure> --help --tempDir
<tempDir>
--startPhase<startPhase> --endPhase
<endPhase>]
Job-Specific
Options:
--input (-i) input Path to job input
directory.
--output (-o) output The directory pathname for
output.
--outputFormat (-of) outputFormat The optional output format
to
write the results as.
Options:
TEXT, CSV or
GRAPH_ML
--substring (-b) substring The number of chars of
the
asFormatString() to
print
--numWords (-n) numWords The number of top terms to
print
--pointsDir (-p) pointsDir The directory containing
points
sequence files mapping
input
vectors to their cluster.
If
specified, then the program
will
output the points associated
with
a
cluster
--samplePoints (-sp) samplePoints Specifies the maximum number
of
points to include _per_
cluster.
The default is to include
all
points
--dictionary (-d) dictionary The dictionary
file
--dictionaryType (-dt) dictionaryType The dictionary file
type
(text|sequencefile)
--evaluate (-e) Run ClusterEvaluator
and
CDbwEvaluator over the input.
The
output will be appended to
the
rest of the output at the
end.
--distanceMeasure (-dm) distanceMeasure The classname of
the
DistanceMeasure. Default
is
SquaredEuclidean
--help (-h) Print out
help
--tempDir tempDir Intermediate output
directory
--startPhase startPhase First phase to
run
--endPhase endPhase Last phase to
run
12/02/16 22:50:31 INFO driver.MahoutDriver: Program took 308 ms (Minutes:
0.0051333333333333335)