Dmitry Serebrennikov wrote:
Hi Christoph,

I agree that your approach achieves better disk usage than deleting segments as they are being merged into the compound file, chiefly because most indexes have one or two large files and the rest are small. I have not reviewed your latest code yet (it's a bit hard without a checked out working copy of the CVS image, btw, could you post diffs so others can more readily review them?), but from what you are describing

I attatch diffs for SegmentMerger, SegmentReader, and IndexWriter to this mail.

here's what I think. It sounds like it would work, but it also sounds a bit cludgy. The main thing that I don't like is that we are now inventing another way of doing what Lucene already does - maintaining index integrity across filesystem changes and safely deleting unneeded files. I'm thinking that Lucene already has a way of switching to the new segments file, but we are proposing something similar with renaming of the cfs file.

But I am using the deletion mechanism of IndexWriter.


A note on the norms with .f and .s files - this is getting complicated...

The problem here is that if I use .f for changed norms and .f is still in the deletable list that was generated by an IndexWriter, the norm changes could get lost!


One note on SegmentReader.files() - we should probably have the "tmp" extension listed here so we can cleanup segments that failed to create a cfs file.

Yes, good idea.


Here's an alternative idea that leverages existing Lucene segments file:
Could we simply create compound file in a new segment? This way we don't have to invent the "tmp" file or change anything else about the files (like the norms stuff).


All in all, I haven't really been involved in Lucene codebase closely enough lately, and this is starting to impact things like norms, locks, and merging, so that I don't feel qualified to make the final call on this. I'd like to hear what Doug and others think. From my point of view, I don't really see anything *wrong* with the latest set of changes (just need to add "tmp" file to SegmentReader.files()), but it doesn't strike me as an obviously *right* way to do this either yet. So I'll change my vote to a 0 and see what others think. :)

Yes, lets wait for Doug and others for a final decision.

Christoph
Index: IndexWriter.java
===================================================================
RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
retrieving revision 1.34
retrieving revision 1.37
diff -r1.34 -r1.37
124,126c124,127
<   /** Setting to turn on usage of a compound file. When on, multiple files
<    *  for each segment are merged into a single file once the segment creation
<    *  is finished. This is done regardless of what directory is in use.
---
>   /** Get the current setting of whether to use the compound file format.
>    *  Note that this just returns the value you set with setUseCompoundFile(boolean)
>    *  or the default. You cannot use this to query the status of an existing index.
>    *  @see #setUseCompoundFile(boolean)
402,403c403,404
<     String mergedName = newSegmentName();
<     SegmentMerger merger = new SegmentMerger(directory, mergedName, useCompoundFile);
---
>     final String mergedName = newSegmentName();
>     SegmentMerger merger = new SegmentMerger(directory, mergedName);
425c426
<       new Lock.With(directory.makeLock("commit.lock"), COMMIT_LOCK_TIMEOUT) {
---
>       new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
432a434,448
>     
>     if (useCompoundFile) {
>       final Vector filesToDelete = merger.createCompoundFile(mergedName + ".tmp");
>       synchronized (directory) { // in- & inter-process sync
>         new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
>           public Object doBody() throws IOException {
>             // make compound file visible for SegmentReaders
>             directory.renameFile(mergedName + ".tmp", mergedName + ".cfs");
>             // delete now unused files of segment 
>             deleteFiles(filesToDelete);   
>             return null;
>           }
>         }.run();
>       }
>     }
480c496
<     String mergedName = newSegmentName();
---
>     final String mergedName = newSegmentName();
483c499
<         new SegmentMerger(directory, mergedName, useCompoundFile);
---
>         new SegmentMerger(directory, mergedName);
511c527
<       new Lock.With(directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), 
COMMIT_LOCK_TIMEOUT) {
---
>       new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
519c535,549
< 
---
>     
>     if (useCompoundFile) {
>       final Vector filesToDelete = merger.createCompoundFile(mergedName + ".tmp");
>       synchronized (directory) { // in- & inter-process sync
>         new Lock.With(directory.makeLock(COMMIT_LOCK_NAME), COMMIT_LOCK_TIMEOUT) {
>           public Object doBody() throws IOException {
>             // make compound file visible for SegmentReaders
>             directory.renameFile(mergedName + ".tmp", mergedName + ".cfs");
>             // delete now unused files of segment 
>             deleteFiles(filesToDelete);   
>             return null;
>           }
>         }.run();
>       }
>     }
522,525c552,557
<   /* Some operating systems (e.g. Windows) don't permit a file to be deleted
<      while it is opened for read (e.g. by another process or thread).  So we
<      assume that when a delete fails it is because the file is open in another
<      process, and queue the file for subsequent deletion. */
---
>   /*
>    * Some operating systems (e.g. Windows) don't permit a file to be deleted
>    * while it is opened for read (e.g. by another process or thread). So we
>    * assume that when a delete fails it is because the file is open in another
>    * process, and queue the file for subsequent deletion.
>    */
540a573,579
>   }
>   
>   private final void deleteFiles(Vector files) throws IOException {
>       Vector deletable = new Vector();
>       deleteFiles(readDeleteableFiles(), deletable); // try to delete deleteable
>       deleteFiles(files, deletable);     // try to delete our files
>       writeDeleteableFiles(deletable);        // note files we can't delete
Index: SegmentMerger.java
===================================================================
RCS file: 
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentMerger.java,v
retrieving revision 1.11
retrieving revision 1.13
diff -r1.11 -r1.13
40d39
<   private boolean useCompoundFile;
59d57
<    * @param compoundFile true if the new segment should use a compoundFile
61c59
<   SegmentMerger(Directory dir, String name, boolean compoundFile) {
---
>   SegmentMerger(Directory dir, String name) {
64d61
<     useCompoundFile = compoundFile;
99,101d95
<     if (useCompoundFile)
<       createCompoundFile();
< 
117c111
<   private final void createCompoundFile()
---
>   final Vector createCompoundFile(String fileName)
120c114
<             new CompoundFileWriter(directory, segment + ".cfs");
---
>             new CompoundFileWriter(directory, fileName);
122,123c116,117
<     ArrayList files =
<       new ArrayList(COMPOUND_EXTENSIONS.length + fieldInfos.size());    
---
>     Vector files =
>       new Vector(COMPOUND_EXTENSIONS.length + fieldInfos.size());    
153,158c147,148
<         
<     // Now delete the source files
<     it = files.iterator();
<     while (it.hasNext()) {
<       directory.deleteFile((String) it.next());
<     }
---
>    
>     return files;
Index: SegmentReader.java
===================================================================
RCS file: 
/home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentReader.java,v
retrieving revision 1.24
retrieving revision 1.25
diff -r1.24 -r1.25
36c36
<  * @version $Id: SegmentReader.java,v 1.24 2004/08/06 20:50:29 dnaber Exp $
---
>  * @version $Id: SegmentReader.java,v 1.25 2004/08/11 17:37:52 goller Exp $
56c56
<   CompoundFileReader cfsReader;
---
>   CompoundFileReader cfsReader = null;
78c78,84
<       String fileName = segment + ".f" + number;
---
>       String fileName;
>       if(cfsReader == null)
>           fileName = segment + ".f" + number;
>       else{ 
>           // use a different file name if we have compound format
>           fileName = segment + ".s" + number;
>       }
219,220c225,233
<       if (fi.isIndexed)
<         files.addElement(segment + ".f" + i);
---
>       if (fi.isIndexed){
>         String name;
>         if(cfsReader == null)
>             name = segment + ".f" + i;
>         else
>             name = segment + ".s" + i;
>         if (directory().fileExists(name))
>             files.addElement(name);
>       }
366,368c379,385
<         String fileName = segment + ".f" + fi.number;
<         // look first for re-written file, then in compound format
<         Directory d = directory().fileExists(fileName) ? directory() : cfsDir;
---
>         // look first if there are separate norms in compound format
>         String fileName = segment + ".s" + fi.number;
>         Directory d = directory();
>         if(!d.fileExists(fileName)){
>             fileName = segment + ".f" + fi.number;
>             d = cfsDir;
>         }

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to