> From: Winton Davies [mailto:[EMAIL PROTECTED]]
> (2) How can i avoid the FD problem? I know about parallelizing the
> indexing, but I'd like to get an efficient single index before doing
> that ? If I could set the Merge Factor up real high, then I think I'd
> be able to work
Assume that you can comfortably hold a 100,000 document index in RAM.
You might try something like:
IndexWriter writer = new IndexWriter(...);
writer.mergeFactor = 100000;
writer.maxMergeDocs = 100000;
... add all your documents ...
writer.mergeFactor = 100;
writer.maxMergeDocs = Integer.MAX_VALUE;
writer.optimize();
writer.close();
The initial indexes created for single documents are created in a
RAMDirectory. Setting mergeFactor == maxMergeDocs means that it will only
do RAM->FS merging, not FS->FS merging, so very few file handles are used.
A more efficient and slightly more complex approach would be to build large
indexes in RAM, and copy them to disk with IndexWriter.addIndexes:
IndexWriter fsWriter = new IndexWriter(new File(...), analyzer, true);
while (... more docs to index...)
RAMDirectory ramDir = new RAMDirectory();
IndexWriter ramWriter = new IndexWriter(ramDir, analyzer, true);
... add 100,000 docs to ramWriter ...
ramWriter.optimize();
ramWriter.close();
fsWriter.addIndexes(new Directory[] { ramDir });
}
fsWriter.optimize();
fsWriter.close();
This is broken in the release. Instead use the nightly build to try this.
If you try these, please report back on how well they work.
Doug
_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users