the method to update a document is not optimized (reindexDocument
method). this actual behavior is :
1- open reader if not open (but in fact it's always closed because of line 3)
2-delete document
3-close reader
4-open writer
5- write index
6-close index
(NOTE: with this behavior, the merge factor is useless because this
method index only one document for a opening of indexwriter)
- A optimization in lucene is to avoid to open and close indexreader
and indexwriter a lot of times.
so i propose this simple optimization :
1- open reader if not open
2- delete document
3-store lucene document in a buffer (Stack)
// flush the buffer
if ((buffer % max_buffer)==0) {
// switch to write mode
4- close reader
5- open writer
for (1 to max_buffer) {
6- write
}
7- close writer
}
with this kind of method,
1 -
with a buffer of 100 doc, you divide the number of switching mode
(writ/read) to 100 , and the indexing is much much faster
2- the merge factor is really useful because the indexwriter index
more than 1 document
i've developped a Index component with 2 implemenations
1 indexerDefault with this kind of method
2- MultiThreadIndexer optimized for multiple CPU
maybe it could be interesting to integred this components to the lucene Block
Nicolas Maisonneuve