This cannot be running on the latest trunk. The job no longer has a -c argument and the initial clusters are always computed by running Canopy on the converted data. It is meant to be run with no arguments; default values are provided (EuclideanDM, 80, 55) that work consistently. The only variables are the distance measure, t1 and t2 values for Canopy. If these are changed there will be somewhere between 1 and 600 clusters generated by Canopy and k-Means processes them fine.

Predictably, when I run with t1=800 and t2=550 I get a single cluster out; with t1=8 and t2=5.5 I get 600. There is no way I can imagine to ever get 0 clusters out of Canopy.

I think this has been fixed, but show me a command line that can generate this error and I will have something to work with.


On 9/25/10 3:57 AM, Sean Owen (JIRA) wrote:
      [ 
https://issues.apache.org/jira/browse/MAHOUT-504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened MAHOUT-504:
------------------------------


Kmeans clustering error
-----------------------

                 Key: MAHOUT-504
                 URL: https://issues.apache.org/jira/browse/MAHOUT-504
             Project: Mahout
          Issue Type: Bug
            Reporter: Zhen Guo
            Assignee: Robin Anil
             Fix For: 0.4


I tried the Kmeans algorithm on the Synthetic Control data. The following error 
appears. I tried the Canopy algorithm, it is fine. This error is from Mapper. I 
am using Trunk.
10/09/20 19:40:06 INFO mapred.JobClient: Task Id : 
attempt_201008261432_1324_m_000000_0, Status : FAILED
java.lang.IllegalStateException: Cluster is empty!
        at 
org.apache.mahout.clustering.kmeans.KMeansClusterMapper.setup(KMeansClusterMapper.java:57)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)

Reply via email to