Yes, well, almost. I'm refactoring the CDbw implementation to separate the representativePoints extraction so multiple evaluators can be used with a tractable set of points for each cluster. I'm adding a new ClusterEvaluator that uses the same code as in "Mahout in Action" for inter-cluster density and, since intra-cluster density code is not shown in the book, my crack at it over the representative points. Since that set is reasonably sized, it can all be done in memory. I will commit that today.

On 9/24/10 7:44 AM, Sean Owen (JIRA) wrote:
      [ 
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved MAHOUT-236.
------------------------------

     Fix Version/s: 0.5
                        (was: 0.4)
        Resolution: Fixed

If I read this right, Jeff, you're done here? at least for 0.4 purposes?

Cluster Evaluation Tools
------------------------

                 Key: MAHOUT-236
                 URL: https://issues.apache.org/jira/browse/MAHOUT-236
             Project: Mahout
          Issue Type: New Feature
          Components: Clustering
            Reporter: Grant Ingersoll
            Assignee: Jeff Eastman
             Fix For: 0.5

         Attachments: MAHOUT-236.patch, MAHOUT-236.patch, MAHOUT-236.patch, 
MAHOUT-236.patch, MAHOUT-236.patch


Per 
http://www.lucidimagination.com/search/document/10b562f10288993c/validating_clustering_output#9d3f6a55f4a91cb6,
 it would be great to have some utilities to help evaluate the effectiveness of 
clustering.

Reply via email to