Dmitry,
yeap, you're right Dmitry. Switch on/off compound file would be the trick to simulate the same behavior i described. I did some test on that and found that it working perfect. I think we can leave everything as it is, maybe we should document it somewhere.
Does there exists something like a "tips and tricks" section on the lucene website ?
Bernhard
Dmitry Serebrennikov wrote:
Bernhard Messer wrote:
hi developers,
may be there is a small, but effective possibility to optimize the SegmentMerger class when compound file option is enabled, which is default since lucene 1.4.
The current implementation creates and writes the compound index file every time the merge() method is called. Due to the fact, that io operations are expensive and time consuming, it would be cool to write the compound index file just when optimizing the index. The change itself wouldn't be a big deal, adding a boolean parameter to SegmenMerger.merge(boolean finalize). Only if finalize==true and compound option is enabled, the compound file will be created. To fullfill the implementation, the same parameter could be added to mergeSegments(int minSegment, boolean finalize) within IndexWriter. When mergeSegments is called from flushRamSegments() or maybeMergeSegments(), finalize is set to false. Only when called from optimize(), finalize will be set to true and the compound file will be written.
The dark side will be to explain developers, if they are not optimizing the index before closing, compound file option has no effect. The other thing is, that we might run into the problem of too many open files, which sometimes was reported before the compound option was introduced.
Yea, that was kind of the point of having the compound files - to avoid too many file handles, especially during indexing. I hear you on inefficient use of disk IO, though.
The negative side could be solved when making the optimization optionally available thru IndexWriter. So developers using lucene could decide themself if they want to use the "single compound write" option or not.
One could do that today. Just setUseCompoundFiles(false) during indexing and call setUseCompoundFiles(true) before the final optimize. Would that do the trick?
Dmitry.
If wanted and you would like to see the patch, leave me a note and i'll create it.
best regards Bernhard
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]