Sorry for the delay. I was busy with other things and it took some time to fix the problems that Doug mentioned.
Furthermore, thanks for the links concerning dublicate checking.
I would like to commit my timestamp patch in the next few days. So my question goes to the mailing list whether there are any objections.
Doug Cutting schrieb:
I like this approach, but I have two concerns:
1. For back-compatibility, the version number should be written at the end of the "segments" file, not at its start. This may make things a big slower, but we should not incompatibly change the file format: existing indexes should still open correctly.
done. I also made some tests. Reading and old index and changing it works now.
2. There should be public IndexReader.getVersion methods, correspodning to the existing IndexReader.getLastModified methods. The lastModified methods should be deprecated, with a comment indicating that the getVersion methods should be used instead.
done
please have a look at the latest version of this patch in this mail (again to be applied to index directory).
Some thoughts about FilterIndexReader: I noticed that all synchronization for a FilterIndexReader is done by the FilterIndexReader itself, more precisely by its superclass IndexReader and its methods delete and close. The synchronization is not done by FilterIndexReader's member in. Since I introduced a SegmentInfos member in IndexReader for synchronization purposes it is needed there also for the superclass of a FilterIndexReader. I changed the constructor of FilterIndexReader to achieve that.
Christoph
Index: FilterIndexReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/FilterIndexReader.java,v retrieving revision 1.3 diff -u -r1.3 FilterIndexReader.java --- FilterIndexReader.java 6 Nov 2003 15:30:57 -0000 1.3 +++ FilterIndexReader.java 19 Nov 2003 14:23:36 -0000 @@ -114,6 +114,7 @@ public FilterIndexReader(IndexReader in) { super(in.directory()); + segmentInfos = in.segmentInfos; this.in = in; } Index: IndexReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/IndexReader.java,v retrieving revision 1.22 diff -u -r1.22 IndexReader.java --- IndexReader.java 21 Oct 2003 18:24:23 -0000 1.22 +++ IndexReader.java 19 Nov 2003 14:23:36 -0000 @@ -83,14 +83,14 @@ public abstract class IndexReader { protected IndexReader(Directory directory) { this.directory = directory; - segmentInfosAge = Long.MAX_VALUE; + stale = false; + segmentInfos = null; } private Directory directory; private Lock writeLock; - - //used to determine whether index has chaged since reader was opened - private long segmentInfosAge; + protected SegmentInfos segmentInfos; + private boolean stale; /** Returns an IndexReader reading the index in an FSDirectory in the named path. */ @@ -111,21 +111,16 @@ directory.makeLock(IndexWriter.COMMIT_LOCK_NAME), IndexWriter.COMMIT_LOCK_TIMEOUT) { public Object doBody() throws IOException { - IndexReader result = null; - SegmentInfos infos = new SegmentInfos(); infos.read(directory); if (infos.size() == 1) { // index is optimized - result = new SegmentReader(infos.info(0), true); + return new SegmentReader(infos, infos.info(0), true); } else { SegmentReader[] readers = new SegmentReader[infos.size()]; for (int i = 0; i < infos.size(); i++) - readers[i] = new SegmentReader(infos.info(i), i==infos.size()-1); - result = new SegmentsReader(directory, readers); + readers[i] = new SegmentReader(infos, infos.info(i), i==infos.size()-1); + return new SegmentsReader(infos, directory, readers); } - - result.segmentInfosAge = lastModified(directory); - return result; } }.run(); } @@ -134,20 +129,89 @@ /** Returns the directory this index resides in. */ public Directory directory() { return directory; } - /** Returns the time the index in the named directory was last modified. */ + /** + * Returns the time the index in the named directory was last modified. + * + * <p>Synchronization of IndexReader and IndexWriter instances is + * no longer done via time stamps of the segmets file since the time resolution + * depends on the hardware platform. Instead, a version number is maintained + * within the segments file, which is incremented everytime when the index is + * changed.</p> + * + * @deprecated Replaced by [EMAIL PROTECTED] #getCurrentVersion(String)} + * */ public static long lastModified(String directory) throws IOException { return lastModified(new File(directory)); } - /** Returns the time the index in the named directory was last modified. */ + /** + * Returns the time the index in the named directory was last modified. + * + * <p>Synchronization of IndexReader and IndexWriter instances is + * no longer done via time stamps of the segmets file since the time resolution + * depends on the hardware platform. Instead, a version number is maintained + * within the segments file, which is incremented everytime when the index is + * changed.</p> + * + * @deprecated Replaced by [EMAIL PROTECTED] #getCurrentVersion(File)} + * */ public static long lastModified(File directory) throws IOException { return FSDirectory.fileModified(directory, "segments"); } - /** Returns the time the index in this directory was last modified. */ + /** + * Returns the time the index in the named directory was last modified. + * + * <p>Synchronization of IndexReader and IndexWriter instances is + * no longer done via time stamps of the segmets file since the time resolution + * depends on the hardware platform. Instead, a version number is maintained + * within the segments file, which is incremented everytime when the index is + * changed.</p> + * + * @deprecated Replaced by [EMAIL PROTECTED] #getCurrentVersion(Directory)} + * */ public static long lastModified(Directory directory) throws IOException { return directory.fileModified("segments"); } + + /** + * Reads version number from segments files. The version number counts the + * number of changes of the index. + * + * @param directory where the index resides. + * @return version number. + * @throws IOException if segments file cannot be read + */ + public static long getCurrentVersion(String directory) throws IOException { + return getCurrentVersion(new File(directory)); + } + + /** + * Reads version number from segments files. The version number counts the + * number of changes of the index. + * + * @param directory where the index resides. + * @return version number. + * @throws IOException if segments file cannot be read + */ + public static long getCurrentVersion(File directory) throws IOException { + Directory dir = FSDirectory.getDirectory(directory, false); + long version = getCurrentVersion(dir); + dir.close(); + return version; + } + + /** + * Reads version number from segments files. The version number counts the + * number of changes of the index. + * + * @param directory where the index resides. + * @return version number. + * @throws IOException if segments file cannot be read. + */ + public static long getCurrentVersion(Directory directory) throws IOException { + return SegmentInfos.readCurrentVersion(directory); + } /** * Returns <code>true</code> if an index exists at the specified directory. @@ -274,6 +338,9 @@ this will be corrected eventually as the index is further modified. */ public final synchronized void delete(int docNum) throws IOException { + if(stale) + throw new IOException("IndexReader out of date and no longer valid for deletion"); + if (writeLock == null) { Lock writeLock = directory.makeLock(IndexWriter.WRITE_LOCK_NAME); if (!writeLock.obtain(IndexWriter.WRITE_LOCK_TIMEOUT)) // obtain write lock @@ -282,11 +349,11 @@ // we have to check whether index has changed since this reader was opened. // if so, this reader is no longer valid for deletion - if(lastModified(directory) > segmentInfosAge){ + if(segmentInfos != null && SegmentInfos.readCurrentVersion(directory) > segmentInfos.getVersion()){ + stale = true; this.writeLock.release(); this.writeLock = null; - throw new IOException( - "IndexReader out of date and no longer valid for deletion"); + throw new IOException("IndexReader out of date and no longer valid for deletion"); } } doDelete(docNum); Index: SegmentInfos.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentInfos.java,v retrieving revision 1.2 diff -u -r1.2 SegmentInfos.java --- SegmentInfos.java 18 Nov 2003 11:35:57 -0000 1.2 +++ SegmentInfos.java 19 Nov 2003 14:23:36 -0000 @@ -61,22 +61,28 @@ import org.apache.lucene.store.OutputStream; final class SegmentInfos extends Vector { - public int counter = 0; // used to name new segments + public int counter = 0; // used to name new segments + private long version = 0; //counts how often the index has been changed by adding or deleting docs public final SegmentInfo info(int i) { - return (SegmentInfo)elementAt(i); + return (SegmentInfo) elementAt(i); } public final void read(Directory directory) throws IOException { InputStream input = directory.openFile("segments"); try { - counter = input.readInt(); // read counter + counter = input.readInt(); // read counter for (int i = input.readInt(); i > 0; i--) { // read segmentInfos - SegmentInfo si = new SegmentInfo(input.readString(), input.readInt(), - directory); + SegmentInfo si = + new SegmentInfo(input.readString(), input.readInt(), directory); addElement(si); } - } finally { + if (input.getFilePointer() >= input.length()) + version = 0; // old file format without version number + else + version = input.readLong(); // read version + } + finally { input.close(); } } @@ -84,18 +90,41 @@ public final void write(Directory directory) throws IOException { OutputStream output = directory.createFile("segments.new"); try { - output.writeInt(counter); // write counter - output.writeInt(size()); // write infos + output.writeInt(counter); // write counter + output.writeInt(size()); // write infos for (int i = 0; i < size(); i++) { SegmentInfo si = info(i); output.writeString(si.name); output.writeInt(si.docCount); } - } finally { + output.writeLong(++version); // every write changes the index + } + finally { output.close(); } // install new segment info directory.renameFile("segments.new", "segments"); + } + + /** + * version number when this SegmentInfos was generated. + */ + public long getVersion() { + return version; + } + + /** + * Current version number from segments file. + */ + public static long readCurrentVersion(Directory directory) + throws IOException { + + // We cannot be sure whether the segments file is in the old format or the new one. + // Therefore we have to read the whole file and cannot simple seek to the version entry. + + SegmentInfos sis = new SegmentInfos(); + sis.read(directory); + return sis.getVersion(); } } Index: SegmentReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentReader.java,v retrieving revision 1.15 diff -u -r1.15 SegmentReader.java --- SegmentReader.java 21 Oct 2003 18:24:23 -0000 1.15 +++ SegmentReader.java 19 Nov 2003 14:23:37 -0000 @@ -98,10 +98,11 @@ } private Hashtable norms = new Hashtable(); - SegmentReader(SegmentInfo si, boolean closeDir) + SegmentReader(SegmentInfos sis, SegmentInfo si, boolean closeDir) throws IOException { this(si); closeDirectory = closeDir; + segmentInfos = sis; } SegmentReader(SegmentInfo si) @@ -141,7 +142,10 @@ public Object doBody() throws IOException { deletedDocs.write(directory(), segment + ".tmp"); directory().renameFile(segment + ".tmp", segment + ".del"); - directory().touchFile("segments"); + if(segmentInfos != null) + segmentInfos.write(directory()); + else + directory().touchFile("segments"); return null; } }.run(); Index: SegmentsReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/SegmentsReader.java,v retrieving revision 1.15 diff -u -r1.15 SegmentsReader.java --- SegmentsReader.java 31 Oct 2003 09:46:54 -0000 1.15 +++ SegmentsReader.java 19 Nov 2003 14:23:37 -0000 @@ -78,6 +78,11 @@ private int numDocs = -1; private boolean hasDeletions = false; + SegmentsReader(SegmentInfos sis, Directory directory, SegmentReader[] r) throws IOException { + this(directory, r); + segmentInfos = sis; + } + SegmentsReader(Directory directory, SegmentReader[] r) throws IOException { super(directory); readers = r;
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]