Space: Apache Mahout (https://cwiki.apache.org/confluence/display/MAHOUT) Page: Cluster Dumper (https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper)
Added by Joe Prasanna Kumar: --------------------------------------------------------------------- h1. Introduction Clustering tasks in Mahout will output data in the format of a SequenceFile (Text, Cluster) and the Text is a cluster identifier string. To analyze this output we need to convert the sequence files to a human readable format and this is achieved using the clusterdump utility. h1. Steps for analyzing cluster output using clusterdump utility After you've executed a clustering tasks (either examples or real-world), # get the output data from hadoop into your local machine. For example, in the case where you've executed a clustering example use $HADOOP_HOME/bin/hadoop fs -get output $MAHOUT_HOME/examples/output # Run the clusterdump utility as follows $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir $MAHOUT_HOME/examples/output/clusters-10 --pointsDir $MAHOUT_HOME/examples/output/clusteredPoints/ --output $MAHOUT_HOME/examples/output/clusteranalyze.txt This will output the clusters into a file called clusteranalyze.txt inside $MAHOUT_HOME/examples/output Sample data will look like Change your notification preferences: https://cwiki.apache.org/confluence/users/viewnotifications.action
