Re: optimized disk usage when creating a compound index

Dmitry Serebrennikov Thu, 12 Aug 2004 11:44:07 -0700

Hi Christoph,

I agree that your approach achieves better disk usage than deleting segments as they are being merged into the compound file, chiefly because most indexes have one or two large files and the rest are small. I have not reviewed your latest code yet (it's a bit hard without a checked out working copy of the CVS image, btw, could you post diffs so others can more readily review them?), but from what you are describing here's what I think. It sounds like it would work, but it also sounds a bit cludgy. The main thing that I don't like is that we are now inventing another way of doing what Lucene already does - maintaining index integrity across filesystem changes and safely deleting unneeded files. I'm thinking that Lucene already has a way of switching to the new segments file, but we are proposing something similar with renaming of the cfs file.

A note on the norms with .f and .s files - this is getting complicated...

One note on SegmentReader.files() - we should probably have the "tmp" extension listed here so we can cleanup segments that failed to create a cfs file.

Here's an alternative idea that leverages existing Lucene segments file: Could we simply create compound file in a new segment? This way we don't have to invent the "tmp" file or change anything else about the files (like the norms stuff).

All in all, I haven't really been involved in Lucene codebase closely enough lately, and this is starting to impact things like norms, locks, and merging, so that I don't feel qualified to make the final call on this. I'd like to hear what Doug and others think. From my point of view, I don't really see anything *wrong* with the latest set of changes (just need to add "tmp" file to SegmentReader.files()), but it doesn't strike me as an obviously *right* way to do this either yet. So I'll change my vote to a 0 and see what others think. :)

0.

Cheers.
Dmitry.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: optimized disk usage when creating a compound index

Reply via email to