Hello, While doing some work for the open relevance project, I thought that a large corpus of categorized documents might be useful test data for mahout.
Here is one I am working with: http://ece.ut.ac.ir/DBRG/Hamshahri/(Approximately 160k categorized docs) There is a newer beta verson here: http://ece.ut.ac.ir/DBRG/Hamshahri/ham2/(Approximately 320k categorized docs) -- Robert Muir [email protected]
