Christoph Goller wrote:
Bernhard Messer wrote:
Hi Christoph,
just reviewed the TestCompoundFile.java and you where absolutly right when saying that the test will fail on windows. No the test is changed in a way that a second file with identical data is created. This file can be used in the testcases to make the comparisons against the compound store. Now the modified test runs fine on Microsoft and Linux platforms.
In the attachment you'll find the new TestCompoundFile source.
hope this helps Bernhard
Hi Bernhard,
I reconsidered your chances again. The problem that is solved is the following:
If compound files are used, Lucene needs up to 3 times the disk space (during
indexing) that is required by the final index. The reason is that during a
merge of mergeFactor segments, these segments are doubled by merging them into a
new one and then the new segment is doubled again while generating its compound
file.
You solved the problem by deleting individual files from a segment earlier while
building the compound file. However, this means that the CompoundFileWriter in
its close operation now deletes files. This is not necessarily what one expects
if one uses a CompoundFileWriter. It should only generate a compound file, not delete the original files. Therefore you had to change CompoundFileWriter tests
accordingly!
I'm sorry for juping into this late, but my impression was that the files being deleted were of the new segment, not the files for segments being merged. This, I think, would be ok, because if the operation fails, the old files are still there and the new segment is never entered into the "segments" file and thus the index remains uncorrupted. However, if we delete the previous segments first, we'd have no way of recovering from failure during the merge process.
My idea now is to change IndexWriter so that during merge all old segments are
deleted before the compound file is generated. I think that I also avoid the
factor of 3 and get a maximum factor of 2 concerning disk space. I committed my
changes. Could you do a test as you did with your patch to verify if my changes have the desired outcome too? That would be great,
I'm sory, Christoph, but I don't think these changes will work right... I just looked through the current CVS and it seems to me that there is a problem because segmentInfos.write() calls in the IndexWriter end up replacing "segments" file with a new one that puts the newly created segment on-line. Now, if writing of the compound file fails, we end up with a corrupt index.
Another problem is that the writing of the compound file now happens under the commit.lock, whereas before it happened outside of it. This is potentially a very lengthy operation and will prevent any new IndexReaders from being created for a long time, possibly minutes!
And taking the new call to createCompoundFile() out of the lock won't do either because that would mean that IndexReaders could be created during this time, but they would be confused since they will go after the new segment and try to open a half-constructed "cfs" file.
Again, I'm sorry, but I think I have to -1 these changes.
-1.
Dmitry.
Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]