"Michael D. Curtin" <[EMAIL PROTECTED]> wrote on 12/06/2006 03:49:53 PM: > Nadav Har'El wrote: > > > What I couldn't figure out how to use, however, was the abundant memory (2 > > GB) that this machine has. > > > > I tried playing with IndexWriter.setMaxBufferedDocs(), and noticed that > > there is no speed gain after I set it to 1000, at which point the running > > Lucene takes up just 70 MB of memory, or 140 MB for the two threads. > > It may not be a Lucene limit per se, but a JVM limit instead. What are you > using for the JVM's heap (via -Xms and -Xmx switches)? For example, I often > run with java -Xmx1000m to let the heap grow to a gigabyte, if necessary.
Sure, I used -Xmx500m, for example, but Java still only grew to 70 MB. When I continued to increase setMaxBufferedDocs(), Java's memory use grew, but performance did not continue to improve (it even slightly dropped, I don't know why). So apparently, the problem isn't Java's memory limit, it's Lucene simply not wishing to use more memory. > Might I also suggest that you not try to index all of this data in a single > invocation of a Java program. That is, index a portion, say 10GB at a time, > and then use AddIndexes() later to bring them together. Set your granularity > based on the amount of time you can stand to do work over, when the seemingly > inevitable problems crop up. It sure would stink to be working on the 1000th > GB of input data and have a power supply go out, and then have to start all > the way over from the beginning! Other checkpointing schemes are > possible, if > you have the time and inclination to be more clever, too ... Good idea. Thanks, Nadav. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]