Hello,

While doing some work for the open relevance project, I thought that a large
corpus of categorized documents might be useful test data for mahout.

Here is one I am working with:
http://ece.ut.ac.ir/DBRG/Hamshahri/(Approximately 160k categorized
docs)
There is a newer beta verson here:
http://ece.ut.ac.ir/DBRG/Hamshahri/ham2/(Approximately 320k
categorized docs)

-- 
Robert Muir
[email protected]

Reply via email to