Re: Running cluster dumper from trunk build

Jeff Eastman Thu, 16 Feb 2012 10:48:38 -0800

Looks like it was just changed to -i (--input), likely for uniformitywith other CLI operations. The documentation needs to be updated.


On 2/16/12 10:31 AM, Tharindu Mathew wrote:

Hi,


I'm trying out the synthetic control example and noticed the cluster dumper
command located at [1] does not work.

Appreciate if anyone can correct my command... seems --seqFileDir is
deprecated... I tried a few more combinations of commands and it failed to
work.

[1] - https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper

Files in HDFS at output:

$ bin/hadoop fs -lsr output
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:32
/user/mackie/output/clusteredPoints
-rw-r--r--   1 mackie supergroup          0 2012-02-16 21:32
/user/mackie/output/clusteredPoints/_SUCCESS
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history
-rw-r--r--   1 mackie supergroup       7105 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_1329408095893_mackie_Canopy+Driver+running+clusterData+over+input%3A+outp
-rw-r--r--   1 mackie supergroup      20634 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_conf.xml
-rw-r--r--   1 mackie supergroup     340891 2012-02-16 21:31
/user/mackie/output/clusteredPoints/part-m-00000
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:31
/user/mackie/output/clusters-0-final
-rw-r--r--   1 mackie supergroup          0 2012-02-16 21:31
/user/mackie/output/clusters-0-final/_SUCCESS
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history
-rw-r--r--   1 mackie supergroup      10696 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_1329408047297_mackie_Canopy+Driver+running+buildClusters+over+input%3A+ou
-rw-r--r--   1 mackie supergroup      20920 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_conf.xml
-rw-r--r--   1 mackie supergroup       6747 2012-02-16 21:31
/user/mackie/output/clusters-0-final/part-r-00000
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:30
/user/mackie/output/data
-rw-r--r--   1 mackie supergroup          0 2012-02-16 21:30
/user/mackie/output/data/_SUCCESS
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:30
/user/mackie/output/data/_logs
drwxr-xr-x   - mackie supergroup          0 2012-02-16 21:30
/user/mackie/output/data/_logs/history
-rw-r--r--   1 mackie supergroup       7063 2012-02-16 21:30
/user/mackie/output/data/_logs/history/job_201202162112_0003_1329408010408_mackie_Input+Driver+running+over+input%3A+testdata
-rw-r--r--   1 mackie supergroup      19845 2012-02-16 21:30
/user/mackie/output/data/_logs/history/job_201202162112_0003_conf.xml
-rw-r--r--   1 mackie supergroup     335470 2012-02-16 21:30
/user/mackie/output/data/part-m-00000

Here's my output:

$ $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10
--pointsDir output/clusteredPoints --output
$MAHOUT_HOME/examples/output/clusteranalyze.txt
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
HADOOP_HOME=/Users/mackie/devtools/hadoop-0.20.204.0
No HADOOP_CONF_DIR set, using /Users/mackie/devtools/hadoop-0.20.204.0/conf
MAHOUT-JOB:
/Users/mackie/source-checkouts/mahout-trunk/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar
12/02/16 22:50:31 ERROR common.AbstractJob: Unexpected --seqFileDir while
processing Job-Specific Options:
usage:<command>  [Generic Options] [Job-Specific Options]
Generic Options:
  -archives<paths>               comma separated archives to be unarchived
                                 on the compute machines.
  -conf<configuration file>      specify an application configuration file
  -D<property=value>             use value for given property
  -files<paths>                  comma separated files to be copied to the
                                 map reduce cluster
  -fs<local|namenode:port>       specify a namenode
  -jt<local|jobtracker:port>     specify a job tracker
  -libjars<paths>                comma separated jar files to include in
                                 the classpath.
  -tokenCacheFile<tokensFile>    name of the file with the tokens
Unexpected --seqFileDir while processing Job-Specific
Options:
Usage:

  [--input<input>  --output<output>  --outputFormat<outputFormat>
--substring
<substring>  --numWords<numWords>  --pointsDir<pointsDir>
--samplePoints
<samplePoints>  --dictionary<dictionary>  --dictionaryType
<dictionaryType>
--evaluate --distanceMeasure<distanceMeasure>  --help --tempDir
<tempDir>
--startPhase<startPhase>  --endPhase
<endPhase>]
Job-Specific
Options:
   --input (-i) input                         Path to job input
directory.
   --output (-o) output                       The directory pathname for
output.
   --outputFormat (-of) outputFormat          The optional output format
to
                                              write the results as.
Options:
                                              TEXT, CSV or
GRAPH_ML
   --substring (-b) substring                 The number of chars of
the
                                              asFormatString() to
print
   --numWords (-n) numWords                   The number of top terms to
print
   --pointsDir (-p) pointsDir                 The directory containing
points
                                              sequence files mapping
input
                                              vectors to their cluster.
If
                                              specified, then the program
will
                                              output the points associated
with
                                              a
cluster
   --samplePoints (-sp) samplePoints          Specifies the maximum number
of
                                              points to include _per_
cluster.
                                              The default is to include
all

points
   --dictionary (-d) dictionary               The dictionary
file
   --dictionaryType (-dt) dictionaryType      The dictionary file
type

(text|sequencefile)
   --evaluate (-e)                            Run ClusterEvaluator
and
                                              CDbwEvaluator over the input.
The
                                              output will be appended to
the
                                              rest of the output at the
end.
   --distanceMeasure (-dm) distanceMeasure    The classname of
the
                                              DistanceMeasure. Default
is

SquaredEuclidean
   --help (-h)                                Print out
help
   --tempDir tempDir                          Intermediate output
directory
   --startPhase startPhase                    First phase to
run
   --endPhase endPhase                        Last phase to
run
12/02/16 22:50:31 INFO driver.MahoutDriver: Program took 308 ms (Minutes:
0.0051333333333333335)

Re: Running cluster dumper from trunk build

Reply via email to