Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Lance Norskog Tue, 14 Feb 2012 19:04:47 -0800

Could the error message describe the user's mistake?

On Tue, Feb 14, 2012 at 9:16 AM, Jeff Eastman
<[email protected]> wrote:
> +1 bingo. K-Means is expecting you to provide the prior cluster centers in
> -c. If you want it to sample from your input data you need to add the -k
> option to tell it how many you want. This has been a constant part of the
> api for some time, hence 0.4, 0.5 and 0.6 will all give the same error if
> you overlook this argument.
>
>
>
> On 2/14/12 8:56 AM, Suneel Marthi wrote:
>>
>> You are not specifying the number of clusters that need to be generated,
>> try running again by specifying a -k<number of clusters>  option. You also
>> need to specify that you need clustering to be done with -cl.
>>
>> For example:-
>>
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x
>> 10  -ow -k 20 -cl
>>
>>
>>
>> ________________________________
>>  From: qiang xu (Issue Comment Edited) (JIRA)<[email protected]>
>> To: [email protected]
>> Sent: Tuesday, February 14, 2012 10:48 AM
>> Subject: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering
>> error
>>
>>
>>     [
>> https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207675#comment-13207675
>> ]
>>
>> qiang xu edited comment on MAHOUT-504 at 2/14/12 3:46 PM:
>> ----------------------------------------------------------
>>
>> This problem still exist in mahout 0.5 and 0.6
>> ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>  -ow
>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>> --endPhase=2147483647,
>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>> --maxIter=10, --method=mapreduce,
>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>> --startPhase=0, --tempDir=temp}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>> job_201202131515_0122
>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>         at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> It is really weired that cluster is gernerated
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/
>> Found 4 items
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>> /user/root/examples/bin/work/reuters-kmeans
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>> /user/root/examples/bin/work/reuters-out-seqdir
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/clusters
>> Found 1 items
>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters/part-randomSeed
>>
>> I follow the guide in
>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>                       was (Author: skaterxu):
>>     ./bin/mahout kmeans -i
>> ./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/ -c
>> ./examples/bin/work/clusters -o ./examples/bin/work/reuters-kmeans -x 10
>>  -ow
>> Running on hadoop, using HADOOP_HOME=/data/hadoop_cluster/hadoop-0.20.2/
>> HADOOP_CONF_DIR=/data/hadoop_cluster/hadoop-0.20.2/conf/
>> 12/02/14 20:56:03 INFO common.AbstractJob: Command line arguments:
>> {--clusters=./examples/bin/work/clusters, --convergenceDelta=0.5,
>> --distanceMeasure=org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure,
>> --endPhase=2147483647,
>> --input=./examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors/,
>> --maxIter=10, --method=mapreduce,
>> --output=./examples/bin/work/reuters-kmeans, --overwrite=null,
>> --startPhase=0, --tempDir=temp}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: Input:
>> examples/bin/work/reuters-out-seqdir-sparse/tfidf-vectors Clusters In:
>> examples/bin/work/clusters Out: examples/bin/work/reuters-kmeans Distance:
>> org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: convergence: 0.5 max
>> Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input
>> Vectors: {}
>> 12/02/14 20:56:03 INFO kmeans.KMeansDriver: K-Means Iteration 1
>> 12/02/14 20:56:05 INFO input.FileInputFormat: Total input paths to process
>> : 1
>> 12/02/14 20:56:06 INFO mapred.JobClient: Running job:
>> job_201202131515_0122
>> 12/02/14 20:56:07 INFO mapred.JobClient:  map 0% reduce 0%
>> 12/02/14 20:56:16 INFO mapred.JobClient: Task Id :
>> attempt_201202131515_0122_m_000000_0, Status : FAILED
>> java.lang.IllegalStateException: No clusters found. Check your -c path.
>>         at
>> org.apache.mahout.clustering.kmeans.KMeansMapper.setup(KMeansMapper.java:60)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> It is really weired that cluster is gernerated
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/
>> Found 4 items
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:56
>> /user/root/examples/bin/work/reuters-kmeans
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:29
>> /user/root/examples/bin/work/reuters-out-seqdir
>> drwxr-xr-x   - root supergroup          0 2012-02-14 20:32
>> /user/root/examples/bin/work/reuters-out-seqdir-sparse
>> [root@qxutest mahout-distribution-0.5]# hadoop fs -ls
>> /user/root/examples/bin/work/clusters
>> Found 1 items
>> -rw-r--r--   2 root supergroup        139 2012-02-14 20:55
>> /user/root/examples/bin/work/clusters/part-randomSeed
>>
>> I follow the guide in
>> https://cwiki.apache.org/MAHOUT/k-means-clustering.html
>>
>>> Kmeans clustering error
>>> -----------------------
>>>
>>>                  Key: MAHOUT-504
>>>                  URL: https://issues.apache.org/jira/browse/MAHOUT-504
>>>              Project: Mahout
>>>           Issue Type: Bug
>>>             Reporter: Zhen Guo
>>>             Assignee: Robin Anil
>>>              Fix For: 0.4
>>>
>>>
>>> I tried the Kmeans algorithm on the Synthetic Control data. The following
>>> error appears. I tried the Canopy algorithm, it is fine. This error is from
>>> Mapper. I am using Trunk.
>>> 10/09/20 19:40:06 INFO mapred.JobClient: Task Id :
>>> attempt_201008261432_1324_m_000000_0, Status : FAILED
>>> java.lang.IllegalStateException: Cluster is empty!
>>>     at
>>> org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
>>>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
>>>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>>
>> --
>> This message is automatically generated by JIRA.
>> If you think it was sent incorrectly, please contact your JIRA
>> administrators:
>> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
>> For more information on JIRA, see: http://www.atlassian.com/software/jira




-- 
Lance Norskog
[email protected]

Re: [jira] [Issue Comment Edited] (MAHOUT-504) Kmeans clustering error

Reply via email to