Re: Clustering Demo

Jeff Eastman Sat, 17 May 2008 16:44:39 -0700

Grant Ingersoll wrote:

Anyone have any sample code or demo of running the clustering over alarge collection of documents that they could share? Mainly lookingfor an example of taking some corpus, converting it into theappropriate Mahout representation and then running either the k-meansor the canopy clustering on it.
Thanks,
Grant

I've been experimenting with Hadoop deployments on EC2 and have manageddeploy a single node cluster using an AMI I built from the latest trunkversion (0.18.0). I'm waiting for 0.17.0 to be released since it hasmuch nicer DNS support than (0.16.x) for deploying EC2 clusters. At thatpoint there should be a public 0.17.0 AMI that we all can use. I couldprobably hack the scripts to make mine work but this is a little out ofmy comfort zone and 17 is imminent.

If we can identify some datasets that can be easily downloaded I willput copies in S3 so that they can be easily copied into the cloud oncethat is ready. I've run canopy over some Apache logs in my previous lifebut the kinds of datasets under discussion sound much more interesting.


Jeff

Re: Clustering Demo

Reply via email to