On Sat, 2010-08-14 at 03:24 +0200, andynuss wrote: > Lets say that I am indexing large book documents broken into chapters. A > typical book that you buy at amazon. What would be the approximate limit to > the number of books that can be indexed slowly and searched quickly. The > search unit would be a chapter, so assume that a book is divided into 15-50 > chapters. Any ideas?
Hathi Trust has an excellent blog where they write about indexing 5 million+ scanned books. http://www.hathitrust.org/blogs They focus on OCR'ed books where dirty data is a big problem, but most of their thoughts and solutions can be used for clean data too. Regards, Toke Eskildsen --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org