huge performance increase of LuceneIndexTransformer on large Lucene indexes
---------------------------------------------------------------------------

                 Key: COCOON-2065
                 URL: https://issues.apache.org/jira/browse/COCOON-2065
             Project: Cocoon
          Issue Type: Improvement
          Components: Blocks: Lucene
    Affects Versions: 2.1.10, 2.1.9, 2.1.8, 2.1.7, 2.1.6, 2.1.11-dev (Current 
SVN), 2.2-dev (Current SVN)
            Reporter: Dominique De Munck
            Priority: Minor
             Fix For: 2.1.11-dev (Current SVN)
         Attachments: LuceneIndexTransformer.patch

PROBLEM:

The LuceneIndexTransformer optimizes the Lucene index every time you add an 
entry to the index.
This slows down enormously the indexing with a large index ! If upon every 
checkin of a document eg,
you use it to update the entry, it will slow down.

Eg. I have a Pentium IV 2.4 Ghz, Lucene index contains 10 000 doc.
Where the index update only takes say 60ms, the optimize that get's called, can 
take 7 seconds!


SOLUTION:

I've created a patch that introduces an option "optimize-frequency" to 
determine the frequency of the optimize call.
It defaults to 1 (current behaviour), when a user sets it to 50, only once 
every 50 updates the index will be optimized etc....
If no optimization is wanted, you can set it to 0.

This is compliant to the Lucene documentation (fragment of Lucene FAQ):

"The IndexWriter class supports an optimize() method that compacts the index 
database and speedup queries. You may want to use this method after performing 
a complete indexing of your document set or after incremental updates of the 
index. If your incremental update adds documents frequently, you want to 
perform the optimization only once in a while to avoid the extra overhead of 
the optimization."

PATCH  INFO:


added configuration option + a function  "needToOptimize()" which is called 
before optimizing.
needToOptimize() uses a random function generator, to keep code simple.

- when the option is not set, CODE WILL BE EXECUTED AS BEFORE
- tested one 2.1.11 SVN branch, but no differences in the "main" trunk thus can 
be applied there also.
- Updated API docs
- if patch accepted, I will also update the Wiki:

http://wiki.apache.org/cocoon/LuceneIndexTransformer


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to