You are not specifying the number of clusters that need to be generated, try running again by specifying a -k <number of clusters> option. You also need to specify that you need clustering to be done with -cl.
For example:- ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow -k 20 -cl ________________________________ From: qiang xu (Issue Comment Edited) (JIRA) <[email protected]> To: [email protected] Sent: Tuesday, February 14, 2012 10:48 AM Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error [ https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 ] qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM: ---------------------------------------------------------- This problem still exist in mahout 0.5 and 0.6 ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp} 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) It is really weired that cluster is gernerated [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/ Found 4 items drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters Found 1 items -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html was (Author: skaterxu): ./bin/mahout kmeans -i ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 -ow Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, --endPhase=2147483647, --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, --maxIter=10, --method=mapreduce, --output=./examples/bin/work/reuters-kmeans, --overwrite=null, --startPhase=0, --tempDir=temp} 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process : 1 12/02/14 20:56:06 INFO mapred.JobClient: Running job: job_201202131515_0122 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : attempt_201202131515_0122_m_000000_0, Status : FAILED java.lang.IllegalStateException: No clusters found. Check your -c path. at org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) It is really weired that cluster is gernerated [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/ Found 4 items drwxr-xr-x - root supergroup 0 2012-02-14 20:55 /user/root/examples/bin/work/clusters drwxr-xr-x - root supergroup 0 2012-02-14 20:56 /user/root/examples/bin/work/reuters-kmeans drwxr-xr-x - root supergroup 0 2012-02-14 20:29 /user/root/examples/bin/work/reuters-out-seqdir drwxr-xr-x - root supergroup 0 2012-02-14 20:32 /user/root/examples/bin/work/reuters-out-seqdir-sparse [root@qxutest mahout-distribution-0.5]# hadoop fs -ls /user/root/examples/bin/work/clusters Found 1 items -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 /user/root/examples/bin/work/clusters/part-randomSeed I follow the guide in https://cwiki.apache.org/MAHOUT/k-means-clustering.html > Kmeans clustering error > ----------------------- > > Key: MAHOUT-504 > URL: https://issues.apache.org/jira/browse/MAHOUT-504 > Project: Mahout > Issue Type: Bug > Reporter: Zhen Guo > Assignee: Robin Anil > Fix For: 0.4 > > > I tried the Kmeans algorithm on the Synthetic Control data. The following > error appears. I tried the Canopy algorithm, it is fine. This error is from > Mapper. I am using Trunk. > 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : > attempt_201008261432_1324_m_000000_0, Status : FAILED > java.lang.IllegalStateException: Cluster is empty! > at > org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) > at org.apache.hadoop.mapred.Child.main(Child.java:170) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
