A few references:

http://www.daviddlewis.com/resources/testcollections/reuters21578/
http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html

sv

On Wed, 7 Apr 2004, Matt Quail wrote:

> Hi all,
> 
> I'm doing a presentation to my local JUG on Lucene, and I'm looking for 
> a "good" set of documents to use as a demonstration.
> 
> Ideally it would be:
> 1) large (10,000 plus?).
> 2) contain some metadata besides "body" (like author, date, primarykey, 
> etc).
> 3) freely available.
> 
> I was going to use the data from the previous Google programming 
> contest, but it doesn't seem to be available.
> 
> If I can't find anything satisfactory, I'll probably:
> - generate a fake whitepages phonebook
> - grab documents from project Gutenberg
> 
> My preference is for some "real" data, but I'm happy to generate fake 
> data if no-one has any better ideas.
> 
> :D
> 
> =Matt
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to