First, I would like to appologise for this message being so long, but I have tried to provide sufficient information for somone to potentialy help me diagnose my problem - this being that using the latest build of lucene causes corruption of my indexes, whereas earlier releases of lucene from sourceforge have worked fine.
I'm afraid I can only describe the symptoms of my problem (see below) as I haven't been able to pin-point the actual cause; however I suspect the problem is within the thread safety/index locking changes introduced recently (and included within the jakarta release lucene-1.2-rc1). In particular, I have noticed that the method FSDirectory.getDirectory(File file, boolean create) does not take the 'create' parameter into account when choosing whether to erase existing contents, as suggested by the supporting javaDocs comments. I have attached the the relevant code extract below. I am suprised that no-one else has reported my problem, perhaps no-one else has upgraded their version of lucene or carried out detailed tests since the upgrade ? Or perhaps everyone else re-starts their java server (JVM) when rebuilding/updating their indexes ? I am able to re-create the same problems each time I follow the 3 steps described below. I am a bit lost at the moment as to what to investigate to try to track down the cause of my problematic files within my index directory. Does anyone else keep their JVM running continuously in between creating & updating indexes ? If so, have you encountered problems updating indexes using the latest code ? (Scott Ganyo: perhaps you have a similar set up to myself since you reported the Exception "java.io.IOException: /index/_1x7f.fnm already exists" last month ?). If anyone has a good understanding of how & when the numerous index files are generated/updated, are you able to give me any tips on what to look into to try to identify the cause of my problem. Any help gratefully received. Joanne ================================================================ Symptoms --------- Step 1 - Rebuild index from scratch ----------------------------------- If I re-build an index from scratch; the first time around, all files from my index directory are deleted and new ones are successfully generated, allowing me to successfully search and retrieve documents. The index directory contains all the expected files e.g. .f1, .f2, .fdt, .fnm etc. Step 2 - modify data & update Index ------------------------------------ Without re-starting the java server, I then update a single document and choose to 'update' (not rebuild) an index. Then my application 'appears' to have successfully completed the tesk i.e. no exceptions are reported (or perhaps I am just not trapping them ?). However when I attempt to search for a new word added to the updated document, the 'hit' is not reported. If I search for a word that was removed from the document the 'hit' is still reported. When I take a closer look within the index directory I see several anomolies ... . I have TWO copies of the following files : .f1, .f2, .f4, .f5, .f7, .fdt, .fdx, .frq, .prx, .tis (having the segment prefix _i and _k). However, I only have one copy of deleteble, segments, .fnm and .tii. Hence it's as though the index data has not been replaced, although the .fnm and .tii have been updated. Step 3 - Attempt to rebuild now file has been updated ----------------------------------------------------- Again without stopping the server, I tried to re-build the index from scratch again. This time, the existing files are not deleted (this is due to the code included below, since the FSDirectory instance dir is not null). Again, no exceptions are reported (or perhaps trapped); so the task 'appears' to have completed successfully. The index directory contains the old _k. segment files .f1, .f2, .f4, .f5, .f7, .fdt, .fdx, .fnm, .frq, .fnm, .frq, .prx, .tii, .tis and now contains just 2 new segment files _i.tii and _i.fnm (along with updated files deletable and segments). When I attempt to search this index, an exception is raised from the call to IndexReader.open(dir) with the message "<index_di>\_i.fdt (The system cannot find the file specified)". Additional Info --------------- If I restart the JVM in between each of the above steps, then no problems are encountered; I guess because the problematic classes are re-initialized each time around. Before I upgraded to the latest code which introduced the .lock files; I experienced a different error message (also reported by Scott Ganyo on 27 Sep). Previously the exception "java.io.IOException: /<index dir>/_i.fnm already exists was generated when attempting to re-build or update. This Exception is no longer thrown using the most recent changes as this check has been removed from the method FSOutputStream.FSOutputStream(File path). ***** Extract from the class FSDirectory ******** /** This cache of directories ensures that there is a unique Directory * instance per path, so that synchronization on the Directory can be used to * synchronize access between readers and writers. * * This should be a WeakHashMap, so that entries can be GC'd, but that would * require Java 1.2. Instead we use refcounts... */ private static final Hashtable DIRECTORIES = new Hashtable(); : : /** Returns the directory instance for the named location. * * <p>Directories are cached, so that, for a given canonical path, the same * FSDirectory instance will always be returned. This permits * synchronization on directories. * * @param file the path to the directory. * @param create if true, create, or erase any existing contents. * @returns the FSDirectory for the named file. */ public static FSDirectory getDirectory(File file, boolean create) throws IOException { file = new File(file.getCanonicalPath()); FSDirectory dir; synchronized (DIRECTORIES) { dir = (FSDirectory)DIRECTORIES.get(file); /* JSproston : a second rebuild will not create a *new* FSDirectory since dir will NOT be null, thus even if create is true, existing contents will not be erased. */ if (dir == null) { dir = new FSDirectory(file, create); DIRECTORIES.put(file, dir); } } synchronized (dir) { dir.refCount++; } return dir; } ***** End of Extract ********