Re: Clustering Demo

Karl Wettin Thu, 08 May 2008 08:57:06 -0700

Grant Ingersoll skrev:

Anyone have any sample code or demo of running the clustering over alarge collection of documents that they could share? Mainly looking foran example of taking some corpus, converting it into the appropriateMahout representation and then running either the k-means or the canopyclustering on it.


There is the rule based data set generation in MAHOUT-43.

http://www.datasetgenerator.com

Push a few buttons and you have an insane amount of OK test dataaccording to your specifications. That is what I have been using.

There is also this contact I have with these guys that produce newsarticle data for indexing. The data is nicly organized and they havepreviously offered looking in to committer access to it for local tests.

I have a number of data sets I'm not certain about who owns them. Forinstance I've been gathering real estate data for Sweden for some timeas the sites I was using to find an appartment did not work the way Iwanted them to :)




          karl

Re: Clustering Demo

Reply via email to