Hi Vitaly, Anything by Tom Burton-West should interest you - he works on the HathiTrust digital library project <http://www.hathitrust.org>, which currently indexes 7TB of full-length books, e.g.:
"Practical Relevance Ranking for 10 Million Books" (paper) INEX 2012, September 2012, Rome, Italy <http://www.clef-initiative.eu/documents/71612/943abea5-6e48-48dd-ba89-72c174d001ef> "HathiTrust Large Scale Search: Scalability meets Usability" (slides) Code4Lib 2012, February 2012, Seattle, Washington <http://www.hathitrust.org/documents/HathiTrust-Code4Lib-201202.pptx> "Large-scale Search" (blog) <http://www.hathitrust.org/blogs/large-scale-search> Steve On Dec 23, 2012, at 6:11 AM, vitaly_arte...@mcafee.com wrote: > Hi all, > We start to evaluate Lucene 4.0 for using in the production environment. > This means that we need to index millions of document with TeraBytes of > content and search in it. > For now we want to define only one indexed field, contained the content of > the documents, with possibility to search terms and retrieving the terms > offsets. > Does somebody already tested Lucene with TerabBytes of data? > Does Lucene has some known limitations related to the indexed documents > number or to the indexed documents size? > What is about search performance in huge set of data? > Thanks in advance, Vitaly --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org