One way to force larger read-aheads might be to pump up Lucene's input buffer size. As an experiment, try increasing InputStream.BUFFER_SIZE to 1024*1024 or larger. You'll want to do this just for the merge process and not for searching and indexing. That should help you spend more time doing transfers with less wasted on seeks. If that helps, then perhaps we ought to make this settable via system property or somesuch.Good suggestion... seems about 10% -> 15% faster in a few strawman benchmarks I ran.
How long is it taking to merge your 5GB index? Do you have any stats about disk utilization during merge (seeks/second, bytes transferred/second)? Did you try buffer sizes even larger than 1MB? Are you writing to a different disk, as suggested?
Note that right now this var is final and not public... so that will probably need to change.
Perhaps. I'm reticent to make it too easy to change this. People tend to randomly tweak every available knob and then report bugs, or, if it doesn't crash, start recommending that everyone else tweak the knob as they do. There are lots of tradeoffs with buffer size, cases that folks might not think of (like that a wildcard query creates a buffer for every term that matches), etc.
Does it make sense to also increase the OutputStream.BUFFER_SIZE? This would seem to make sense since an optimize is a large number of reads and writes.
It might help a little if you're merging to the same disk as you're reading from, but probably not a lot. If you're merging to a different disk then it shouldn't make much difference at all.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
