Very cool, I've added these to our collections wiki: http://cwiki.apache.org/confluence/display/MAHOUT/Collections
On Nov 19, 2009, at 3:31 AM, Robert Muir wrote: > Hello, > > While doing some work for the open relevance project, I thought that a large > corpus of categorized documents might be useful test data for mahout. > > Here is one I am working with: > http://ece.ut.ac.ir/DBRG/Hamshahri/(Approximately 160k categorized > docs) > There is a newer beta verson here: > http://ece.ut.ac.ir/DBRG/Hamshahri/ham2/(Approximately 320k > categorized docs) > > -- > Robert Muir > [email protected]
