Time taken in Indexing when the index is already huge

Goel, Nikhil Mon, 04 Apr 2005 19:28:52 -0700

Hi,


I have been using lucene-1.3.jar for quite some time and we are using another 
library to store the index in DB. 

When we started indexing  the writer.optimize used to take in the range of 
600-800 milliseconds to return but now our index has grown to huge proportion 
and its around 10 MB hence the writer.optimize is taking around 30-40 seconds 
and it is not acceptable for our solution. I put the timings on 
writer.optimize() and it's the one which takes most of this time. 

 

So I am just wondering if someone is facing the same problem in indexing the 
data when the index is already huge or is there another way to manage such huge 
index.

 

Here is the simple code which we use to index the data. 

IndexWriter writer = new IndexWriter(dbDirectory, new StandardAnalyzer(), 
false); //Create an indexwriter

writer.addDocument(doc); //doc is of type  
org.apache.lucene.document.Document...

writer.optimize(); //optimize is called on indexwriter..This is the one which 
takes most of the time and is responsible for the delay.

writer.close(); // indexwriter is closed

 

 

The time taken by optimize call grows a lot when the index is of larger size. I 
tried to look it up on Erik Hatcher and Otis Gospodnetić 
<http://www.manning.com/hatcher2#author#author>  book too but everywhere it 
says Lucene is quite scalable and don't have trouble in indexing even with huge 
data. Can anyone please provide  some insight into this?

 

Thanks.

Nikhil

Time taken in Indexing when the index is already huge

Reply via email to