On Wed, Apr 7, 2010 at 11:50 PM, Jeff Eastman <j...@windwardsolutions.com>wrote:
> Hi Robin, > > Interesting paper. I'm beginning to see how to MR the representative point > selection already. The rest will hopefully become clearer with more study. > Lots of MR jobs are needed to: > a) get the data into Vectors, We have something for text, missing for other > formats > b) iterate (e.g. kmeans) over the data to produce a set of clusters, Done > c) cluster the data, Done > d) iterate over the clustered data to derive representative points for each > cluster, and finally Done ;) > e) produce the CDbw.- TODO > And, of course all of this is again iterated with different values for the > clustering algorithm's parameters. Should keep the lights on at PG&E > producing power for the server farms. > > > > Robin Anil wrote: > >> Hi Jeff, >> This is an good paper with a simple measure of cluster quality >> measurement based on intra cluster density and inter cluster separation. >> Its >> pretty easy to compute. Need to make it a map/reduce job >> >> http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw >> Robin >> >> >> >> > >