Thank you

-----Original Message-----
From: Carsten Schnober [mailto:[email protected]] 
Sent: Monday, December 24, 2012 3:25 PM
To: [email protected]
Subject: Re: Lucene 4.0 scalability and performance.

Am 23.12.2012 12:11, schrieb [email protected]:


> This means that we need to index millions of document with TeraBytes of 
> content and search in it.
> For now we want to define only one indexed field, contained the content of 
> the documents, with possibility to search terms and retrieving the terms 
> offsets.
> Does somebody already tested Lucene with TerabBytes of data?
> Does Lucene has some known limitations related to the indexed documents 
> number or to the indexed documents size?
> What is about search performance in huge set of data?

Hi Vitali,
we've been working on a linguistic search engine based on Lucene 4.0 and have 
performed a few tests with large text corpora. There are at least some overlaps 
in the functionality you mentioned (term offsets). See 
http://www.oegai.at/konvens2012/proceedings/27_schnober12p/ (mainly section 5).
Carsten

--
Institut für Deutsche Sprache | http://www.ids-mannheim.de
Projekt KorAP                 | http://korap.ids-mannheim.de
Tel. +49-(0)621-43740789      | [email protected]
Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis 
Platform

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to