A few references:
http://www.daviddlewis.com/resources/testcollections/reuters21578/ http://www.daviddlewis.com/resources/testcollections/reuters21578/readme.txt http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/ http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.html sv On Wed, 7 Apr 2004, Matt Quail wrote: > Hi all, > > I'm doing a presentation to my local JUG on Lucene, and I'm looking for > a "good" set of documents to use as a demonstration. > > Ideally it would be: > 1) large (10,000 plus?). > 2) contain some metadata besides "body" (like author, date, primarykey, > etc). > 3) freely available. > > I was going to use the data from the previous Google programming > contest, but it doesn't seem to be available. > > If I can't find anything satisfactory, I'll probably: > - generate a fake whitepages phonebook > - grab documents from project Gutenberg > > My preference is for some "real" data, but I'm happy to generate fake > data if no-one has any better ideas. > > :D > > =Matt > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]