Nobody reads the docs. If the program itself can do this, instead of just barfing, it should. This is a case of Passive-Agressive Error Reporting.
On Wed, Feb 15, 2012 at 7:20 AM, Jeff Eastman <[email protected]> wrote: > The error message describes what the algorithm can see: that there are no > initial clusters. The wiki documentation seems reasonably clear on the use > of -k > (https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering) to > obtain them by sampling the input dataset, otherwise -c needs to contain > clusters produced by the user. > > > On 2/14/12 8:04 PM, Lance Norskog wrote: >> >> Could the error message describe the user's mistake? >> >> On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman >> <[email protected]> wrote: >>> >>> +1 bingo. K-Means is expecting you to provide the prior cluster centers >>> in >>> -c. If you want it to sample from your input data you need to add the -k >>> option to tell it how many you want. This has been a constant part of the >>> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if >>> you overlook this argument. >>> >>> >>> >>> On 2/14/12 8:56 AM, Suneel Marthi wrote: >>>> >>>> You are not specifying the number of clusters that need to be generated, >>>> try running again by specifying a -k<number of clusters> option. You >>>> also >>>> need to specify that you need clustering to be done with -cl. >>>> >>>> For example:- >>>> >>>> ./bin/mahout kmeans -i >>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x >>>> 10 -ow -k 20 -cl >>>> >>>> >>>> >>>> ________________________________ >>>> From: qiang xu (Issue Comment Edited) (JIRA)<[email protected]> >>>> To: [email protected] >>>> Sent: Tuesday, February 14, 2012 10:48 AM >>>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering >>>> error >>>> >>>> >>>> [ >>>> >>>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675 >>>> ] >>>> >>>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM: >>>> ---------------------------------------------------------- >>>> >>>> This problem still exist in mahout 0.5 and 0.6 >>>> ./bin/mahout kmeans -i >>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 >>>> -ow >>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ >>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ >>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: >>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, >>>> >>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, >>>> --endPhase=2147483647, >>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, >>>> --maxIter=10, --method=mapreduce, >>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null, >>>> --startPhase=0, --tempDir=temp} >>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: >>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: >>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans >>>> Distance: >>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure >>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max >>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable >>>> Input >>>> Vectors: {} >>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 >>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to >>>> process >>>> : 1 >>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: >>>> job_201202131515_0122 >>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% >>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : >>>> attempt_201202131515_0122_m_000000_0, Status : FAILED >>>> java.lang.IllegalStateException: No clusters found. Check your -c path. >>>> at >>>> >>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>>> at >>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>>> It is really weired that cluster is gernerated >>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >>>> /user/root/examples/bin/work/ >>>> Found 4 items >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55 >>>> /user/root/examples/bin/work/clusters >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56 >>>> /user/root/examples/bin/work/reuters-kmeans >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29 >>>> /user/root/examples/bin/work/reuters-out-seqdir >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32 >>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse >>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >>>> /user/root/examples/bin/work/clusters >>>> Found 1 items >>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 >>>> /user/root/examples/bin/work/clusters/part-randomSeed >>>> >>>> I follow the guide in >>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html >>>> was (Author: skaterxu): >>>> ./bin/mahout kmeans -i >>>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c >>>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10 >>>> -ow >>>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/ >>>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/ >>>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments: >>>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5, >>>> >>>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure, >>>> --endPhase=2147483647, >>>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/, >>>> --maxIter=10, --method=mapreduce, >>>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null, >>>> --startPhase=0, --tempDir=temp} >>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input: >>>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In: >>>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans >>>> Distance: >>>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure >>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max >>>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable >>>> Input >>>> Vectors: {} >>>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1 >>>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to >>>> process >>>> : 1 >>>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job: >>>> job_201202131515_0122 >>>> 12/02/14 20:56:07 INFO mapred.JobClient: map 0% reduce 0% >>>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id : >>>> attempt_201202131515_0122_m_000000_0, Status : FAILED >>>> java.lang.IllegalStateException: No clusters found. Check your -c path. >>>> at >>>> >>>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60) >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>>> at >>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>>> It is really weired that cluster is gernerated >>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >>>> /user/root/examples/bin/work/ >>>> Found 4 items >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:55 >>>> /user/root/examples/bin/work/clusters >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:56 >>>> /user/root/examples/bin/work/reuters-kmeans >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:29 >>>> /user/root/examples/bin/work/reuters-out-seqdir >>>> drwxr-xr-x - root supergroup 0 2012-02-14 20:32 >>>> /user/root/examples/bin/work/reuters-out-seqdir-sparse >>>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls >>>> /user/root/examples/bin/work/clusters >>>> Found 1 items >>>> -rw-r--r-- 2 root supergroup 139 2012-02-14 20:55 >>>> /user/root/examples/bin/work/clusters/part-randomSeed >>>> >>>> I follow the guide in >>>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html >>>> >>>>> Kmeans clustering error >>>>> ----------------------- >>>>> >>>>> Key: MAHOUT-504 >>>>> URL: https://issues.apache.org/jira/browse/MAHOUT-504 >>>>> Project: Mahout >>>>> Issue Type: Bug >>>>> Reporter: Zhen Guo >>>>> Assignee: Robin Anil >>>>> Fix For: 0.4 >>>>> >>>>> >>>>> I tried the Kmeans algorithm on the Synthetic Control data. The >>>>> following >>>>> error appears. I tried the Canopy algorithm, it is fine. This error is >>>>> from >>>>> Mapper. I am using Trunk. >>>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id : >>>>> attempt_201008261432_1324_m_000000_0, Status : FAILED >>>>> java.lang.IllegalStateException: Cluster is empty! >>>>> at >>>>> >>>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57) >>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) >>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:170) >>>> >>>> -- >>>> This message is automatically generated by JIRA. >>>> If you think it was sent incorrectly, please contact your JIRA >>>> administrators: >>>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa >>>> For more information on JIRA, see: >>>> http://www.atlassian.com/software/jira >> >> >> > -- Lance Norskog [email protected]
