Could the error message describe the user's mistake? On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman <[email protected]> wrote: > +1 bingo. K-Means is expecting you to provide the prior cluster centers in > -c. If you want it to sample from your input data you need to add the -k > option to tell it how many you want. This has been a constant part of the > api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if > you overlook this argument. > > > > On 2/14/12 8:56 AM, Suneel Marthi wrote: >> >> You are not specifying the number of clusters that need to be generated, >> try running again by specifying a -k<number of clusters> option. You also >> need to specify that you need clustering to be done with -cl. >> >> For example:- >> >> ./bin/mahout kmeans -i >> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x >> 10 -ow -k 20 -cl >> >> >> >> ________________________________ >> From: qiang xu (Issue Comment Edited) (JIRA)<[email protected]> >> To: [email protected] >> Sent: Tuesday, February 14, 2012 10:48 AM >> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering >> error >> >> >> [ >> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 >> ] >> >> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM: >> ---------------------------------------------------------- >> >> This problem still exist in mahout 0.5 and 0.6 >> ./bin/mahout kmeans -i >> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 >> -ow >> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ >> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ >> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: >> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, >> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, >> --endPhase=2147483647, >> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, >> --maxIter=10, --method=mapreduce, >> --output=./examples/bin/work/reuters-kmeans, --overwrite=null, >> --startPhase=0, --tempDir=temp} >> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: >> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: >> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: >> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure >> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max >> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input >> Vectors: {} >> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 >> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process >> : 1 >> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: >> job_201202131515_0122 >> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% >> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : >> attempt_201202131515_0122_m_000000_0, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> It is really weired that cluster is gernerated >> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >> /user/root/examples/bin/work/ >> Found 4 items >> drwxr-xr-x - root supergroup 0 2012-02-14 20:55 >> /user/root/examples/bin/work/clusters >> drwxr-xr-x - root supergroup 0 2012-02-14 20:56 >> /user/root/examples/bin/work/reuters-kmeans >> drwxr-xr-x - root supergroup 0 2012-02-14 20:29 >> /user/root/examples/bin/work/reuters-out-seqdir >> drwxr-xr-x - root supergroup 0 2012-02-14 20:32 >> /user/root/examples/bin/work/reuters-out-seqdir-sparse >> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >> /user/root/examples/bin/work/clusters >> Found 1 items >> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 >> /user/root/examples/bin/work/clusters/part-randomSeed >> >> I follow the guide in >> https://cwiki.apache.org/MAHOUT/k-means-clustering.html >> was (Author: skaterxu): >> ./bin/mahout kmeans -i >> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 >> -ow >> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ >> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ >> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: >> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, >> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, >> --endPhase=2147483647, >> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, >> --maxIter=10, --method=mapreduce, >> --output=./examples/bin/work/reuters-kmeans, --overwrite=null, >> --startPhase=0, --tempDir=temp} >> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: >> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: >> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance: >> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure >> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max >> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input >> Vectors: {} >> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 >> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process >> : 1 >> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: >> job_201202131515_0122 >> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% >> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : >> attempt_201202131515_0122_m_000000_0, Status : FAILED >> java.lang.IllegalStateException: No clusters found. Check your -c path. >> at >> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> It is really weired that cluster is gernerated >> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >> /user/root/examples/bin/work/ >> Found 4 items >> drwxr-xr-x - root supergroup 0 2012-02-14 20:55 >> /user/root/examples/bin/work/clusters >> drwxr-xr-x - root supergroup 0 2012-02-14 20:56 >> /user/root/examples/bin/work/reuters-kmeans >> drwxr-xr-x - root supergroup 0 2012-02-14 20:29 >> /user/root/examples/bin/work/reuters-out-seqdir >> drwxr-xr-x - root supergroup 0 2012-02-14 20:32 >> /user/root/examples/bin/work/reuters-out-seqdir-sparse >> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >> /user/root/examples/bin/work/clusters >> Found 1 items >> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 >> /user/root/examples/bin/work/clusters/part-randomSeed >> >> I follow the guide in >> https://cwiki.apache.org/MAHOUT/k-means-clustering.html >> >>> Kmeans clustering error >>> ----------------------- >>> >>> Key: MAHOUT-504 >>> URL: https://issues.apache.org/jira/browse/MAHOUT-504 >>> Project: Mahout >>> Issue Type: Bug >>> Reporter: Zhen Guo >>> Assignee: Robin Anil >>> Fix For: 0.4 >>> >>> >>> I tried the Kmeans algorithm on the Synthetic Control data. The following >>> error appears. I tried the Canopy algorithm, it is fine. This error is from >>> Mapper. I am using Trunk. >>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : >>> attempt_201008261432_1324_m_000000_0, Status : FAILED >>> java.lang.IllegalStateException: Cluster is empty! >>> at >>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) >>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >> >> -- >> This message is automatically generated by JIRA. >> If you think it was sent incorrectly, please contact your JIRA >> administrators: >> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >> For more information on JIRA, see: http://www.atlassian.com/software/jira
-- Lance Norskog [email protected]
