Yesterday I was inspired by the conversation on the dev. list about indexing in memory, etc and I wrote a new version of IndexWriter.java (it is named IndexWriter2.java). Find the attached file here. The code is stable and worth a try. The following is from the javaDocs for this file:
/** * IndexWriter2 is a modification of the original IndexWriter, coming * with lucene. It benefits from a RAMDirectory, which IndexWriter has * as well. The original IndexWriter treats the segments in the RAMDirectory * no different from the segments in the target directory, where the index is * being built. For example, it ALWAYS merges RAMDirectory segments in the * target directory. Here, we optimize the usage of RAMDirectory in the * following way:<br> * * When a new Document is added, a new segment for it is created in * RAMDirectory. When the RAMDirectory collects 'maxDocsInRam' (this is a new * important setting, the default is 10000) 1-document * segments, IndexWriter2 will merge them into one 10000-documents segment into * RAMDirectory (here is a difference from IndexWriter). Then it moves this * segment from the RAMDirectory to the target directory (usually a file system * directory). This way, during indexing, IndexWriter2 will be writing segments * of equal size (equal to maxDocsInRam) to the target directory. In other * words, during indexing only one file-system segment is opened and dealt with, * which uses just a few file handles. No more "Too many open files" * exceptions.<br> * * After indexing is finished, it is good to call optimize() to merge all * created segments into one. The RAMDirectory is out of the picture here and * is not being used. Here is where we use the mergeFactor setting: * A total of mergeFactor+1 segments will be merged at once into one new * segment. This happens in a loop, until only 1 segment is left. * Here you can get to a "Too many open files" exception, if your mergeFactor * is large. If you set mergeFactor to 1, it will merge only 2 segments at a * time, which will preserve the file handles, but will be a bit slower than * a merge with mergeFactor=10, for example.<br> * * At the end of mergeSegments() originally there was a code, where, if a * segment file can't be deleted (because it's currently opened in Windows), * it stores it's name in a file, named 'deletable', so that it can try to * delete it later. I believe there was some bug with not closing the merged * segments properly, which was the reason for all of this. Anyway, now there * are no problems with deleting these files on Windows and therefore the code, * reading and writing to the 'deletable' file is commented out.<br> * * @author Ivaylo Zlatev ([EMAIL PROTECTED]) */ Two weeks ago I sent an improved PriorityQueue, fixing important memory issues and much more. I just wasted my time - no response at all. Hopefully this time my code will be more useful. Regards, Ivaylo <<IndexWriter2.java>>
IndexWriter2.java
Description: IndexWriter2.java
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
