Re: MAHOUT-236 Cluster Evaluation Tools?

Robin Anil Wed, 07 Apr 2010 21:21:44 -0700

On Wed, Apr 7, 2010 at 11:50 PM, Jeff Eastman <j...@windwardsolutions.com>wrote:


> Hi Robin,
>
> Interesting paper. I'm beginning to see how to MR the representative point
> selection already. The rest will hopefully become clearer with more study.
> Lots of MR jobs are needed to:



> a) get the data into Vectors, We have something for text, missing for other
> formats



> b) iterate (e.g. kmeans) over the data to produce a set of clusters, Done



> c) cluster the data, Done



> d) iterate over the clustered data to derive representative points for each
> cluster, and finally Done ;)



> e) produce the CDbw.- TODO




> And, of course all of this is again iterated with different values for the
> clustering algorithm's parameters. Should keep the lights on at PG&E
> producing power for the server farms.
>
>
>
> Robin Anil wrote:
>
>> Hi Jeff,
>>            This is an good paper with a simple measure of cluster quality
>> measurement based on intra cluster density and inter cluster separation.
>> Its
>> pretty easy to compute. Need to make it a map/reduce job
>>
>> http://docs.google.com/viewer?a=v&q=cache:z5p9n04cBQEJ:www.db-net.aueb.gr/index.php/corporate/content/download/227/833/file/HV_poster2002.pdf+clustering+quality&hl=en&gl=in&pid=bl&srcid=ADGEESiC-ocW6IWrKR4cb1t1ZqkzRKQ3tDv4UFBkVaUKU0gG3kADcPWIjs-60A0912nu8MFPsVM3pf9jKrP98dL-B-BaiOC9LObBS3VkJK6Mu6josZtVegLxp3BftduD3hFxtGOVZK_b&sig=AHIEtbSZwtgw9wmJoojQn7Dlz5OL67vICw
>> Robin
>>
>>
>>
>>
>
>

Re: MAHOUT-236 Cluster Evaluation Tools?

Reply via email to