> From: Winton Davies [mailto:[EMAIL PROTECTED]]
>   (2) How can i avoid the FD problem?  I know about parallelizing the 
> indexing, but I'd like to get an efficient single index before doing 
> that ? If I could set the Merge Factor up real high, then I think I'd 
> be able to work

Assume that you can comfortably hold a 100,000 document index in RAM.

You might try something like:
   IndexWriter writer = new IndexWriter(...);
   writer.mergeFactor = 100000;
   writer.maxMergeDocs = 100000;

   ... add all your documents ...

   writer.mergeFactor = 100;
   writer.maxMergeDocs = Integer.MAX_VALUE;
   writer.optimize();
   writer.close();

The initial indexes created for single documents are created in a
RAMDirectory.  Setting mergeFactor == maxMergeDocs means that it will only
do RAM->FS merging, not FS->FS merging, so very few file handles are used.

A more efficient and slightly more complex approach would be to build large
indexes in RAM, and copy them to disk with IndexWriter.addIndexes:
  IndexWriter fsWriter = new IndexWriter(new File(...), analyzer, true);
  while (... more docs to index...)
    RAMDirectory ramDir = new RAMDirectory();
    IndexWriter ramWriter = new IndexWriter(ramDir, analyzer, true);
    ... add 100,000 docs to ramWriter ...
    ramWriter.optimize();
    ramWriter.close();
    fsWriter.addIndexes(new Directory[] { ramDir });
  }
  fsWriter.optimize();
  fsWriter.close();

This is broken in the release.  Instead use the nightly build to try this.

If you try these, please report back on how well they work.

Doug

_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/lucene-users

Reply via email to