Re: MAHOUT-236 Cluster Evaluation Tools?

Jeff Eastman Wed, 07 Apr 2010 11:21:07 -0700

Hi Robin,

Interesting paper. I'm beginning to see how to MR the representativepoint selection already. The rest will hopefully become clearer withmore study. Lots of MR jobs are needed to: a) get the data into Vectors,b) iterate (e.g. kmeans) over the data to produce a set of clusters, c)cluster the data, d) iterate over the clustered data to deriverepresentative points for each cluster, and finally e) produce the CDbw.And, of course all of this is again iterated with different values forthe clustering algorithm's parameters. Should keep the lights on at PG&Eproducing power for the server farms.



Robin Anil wrote:

Hi Jeff,
            This is an good paper with a simple measure of cluster quality
measurement based on intra cluster density and inter cluster separation. Its
pretty easy to compute. Need to make it a map/reduce job
http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw
Robin

Re: MAHOUT-236 Cluster Evaluation Tools?

Reply via email to