Need a set of documents checked in to mahout trunk

Robin Anil Tue, 09 Feb 2010 06:12:12 -0800

I feel a need to check in a set of text documents to mahout. maybe 3-4
categories of documents 10 each.
can be used in clustering classification, vectorizer collocation testing and
even frequent pattern generation


And instead doing artificial tests each of it can use this to test against a
reference implementation written in the testclass like what kmeans does.

Plus we will have a baseline with which we can see improvements in these
algorithms. Any idea of some good(legally sound :))  dataset which we can
use?

Same idea can be extended to CF also


Robin

Need a set of documents checked in to mahout trunk

Reply via email to