Thank you -----Original Message----- From: Carsten Schnober [mailto:[email protected]] Sent: Monday, December 24, 2012 3:25 PM To: [email protected] Subject: Re: Lucene 4.0 scalability and performance.
Am 23.12.2012 12:11, schrieb [email protected]: > This means that we need to index millions of document with TeraBytes of > content and search in it. > For now we want to define only one indexed field, contained the content of > the documents, with possibility to search terms and retrieving the terms > offsets. > Does somebody already tested Lucene with TerabBytes of data? > Does Lucene has some known limitations related to the indexed documents > number or to the indexed documents size? > What is about search performance in huge set of data? Hi Vitali, we've been working on a linguistic search engine based on Lucene 4.0 and have performed a few tests with large text corpora. There are at least some overlaps in the functionality you mentioned (term offsets). See http://www.oegai.at/konvens2012/proceedings/27_schnober12p/ (mainly section 5). Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | [email protected] Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
