Nadav Har'El wrote:

What I couldn't figure out how to use, however, was the abundant memory (2
GB) that this machine has.

I tried playing with IndexWriter.setMaxBufferedDocs(), and noticed that
there is no speed gain after I set it to 1000, at which point the running
Lucene takes up just 70 MB of memory, or 140 MB for the two threads.

Is there a way for Lucene to make use of the much larger memory I have, to
speed up the indexing process? Does having a huge memory somehow improve
the speed of huge merges, for example?

It may not be a Lucene limit per se, but a JVM limit instead. What are you using for the JVM's heap (via -Xms and -Xmx switches)? For example, I often run with java -Xmx1000m to let the heap grow to a gigabyte, if necessary.

Might I also suggest that you not try to index all of this data in a single invocation of a Java program. That is, index a portion, say 10GB at a time, and then use AddIndexes() later to bring them together. Set your granularity based on the amount of time you can stand to do work over, when the seemingly inevitable problems crop up. It sure would stink to be working on the 1000th GB of input data and have a power supply go out, and then have to start all the way over from the beginning! Other checkpointing schemes are possible, if you have the time and inclination to be more clever, too ...

Good luck!

--MDC

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to