Yes, well, almost. I'm refactoring the CDbw implementation to separate
the representativePoints extraction so multiple evaluators can be used
with a tractable set of points for each cluster. I'm adding a new
ClusterEvaluator that uses the same code as in "Mahout in Action" for
inter-cluster density and, since intra-cluster density code is not shown
in the book, my crack at it over the representative points. Since that
set is reasonably sized, it can all be done in memory. I will commit
that today.
On 9/24/10 7:44 AM, Sean Owen (JIRA) wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen resolved MAHOUT-236.
------------------------------
Fix Version/s: 0.5
(was: 0.4)
Resolution: Fixed
If I read this right, Jeff, you're done here? at least for 0.4 purposes?
Cluster Evaluation Tools
------------------------
Key: MAHOUT-236
URL: https://issues.apache.org/jira/browse/MAHOUT-236
Project: Mahout
Issue Type: New Feature
Components: Clustering
Reporter: Grant Ingersoll
Assignee: Jeff Eastman
Fix For: 0.5
Attachments: MAHOUT-236.patch, MAHOUT-236.patch, MAHOUT-236.patch,
MAHOUT-236.patch, MAHOUT-236.patch
Per
http://www.lucidimagination.com/search/document/10b562f10288993c/validating_clustering_output#9d3f6a55f4a91cb6,
it would be great to have some utilities to help evaluate the effectiveness of
clustering.