[CONF] Apache Mahout > Cluster Dumper

confluence Tue, 31 Aug 2010 18:58:24 -0700

Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT)
Page: Cluster Dumper 
(https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper)


Added by Joe Prasanna Kumar:
---------------------------------------------------------------------
h1. Introduction
Clustering tasks in Mahout will output data in the format of a SequenceFile 
(Text, Cluster) and the Text is a cluster identifier string. To analyze this 
output we need to convert the sequence files to a human readable format and 
this is achieved using the clusterdump utility.

h1. Steps for analyzing cluster output using clusterdump utility 
After you've executed a clustering tasks (either examples or real-world), 
# get the output data from hadoop into your local machine. For example, in the 
case where you've executed a clustering example use
$HADOOP_HOME/bin/hadoop fs -get output $MAHOUT_HOME/examples/output
# Run the clusterdump utility as follows
$MAHOUT_HOME/bin/mahout clusterdump --seqFileDir 
$MAHOUT_HOME/examples/output/clusters-10 --pointsDir 
$MAHOUT_HOME/examples/output/clusteredPoints/ --output 
$MAHOUT_HOME/examples/output/clusteranalyze.txt
This will output the clusters into a file called clusteranalyze.txt inside 
$MAHOUT_HOME/examples/output
Sample data will look like





Change your notification preferences: 
https://cwiki.apache.org/confluence/users/viewnotifications.action

[CONF] Apache Mahout > Cluster Dumper

Reply via email to