How long is it taking to merge your 5GB index? Do you have any stats about disk utilization during merge (seeks/second, bytes transferred/second)? Did you try buffer sizes even larger than 1MB? Are you writing to a different disk, as suggested?
I'll do some more testing tonight and get back to you
Note that right now this var is final and not public... so that will probably need to change.
Perhaps. I'm reticent to make it too easy to change this. People tend to randomly tweak every available knob and then report bugs, or, if it doesn't crash, start recommending that everyone else tweak the knob as they do. There are lots of tradeoffs with buffer size, cases that folks might not think of (like that a wildcard query creates a buffer for every term that matches), etc.
Or you can do what I do and recompile ;)
Right now we are merging to the same disk... I'll perform some real benchmarks with this var too. Long term we're going to migrate to using to SCSI disks per machine and then doing parallel queries across them with optimized indexes.Does it make sense to also increase the OutputStream.BUFFER_SIZE? This would seem to make sense since an optimize is a large number of reads and writes.
It might help a little if you're merging to the same disk as you're reading from, but probably not a lot. If you're merging to a different disk then it shouldn't make much difference at all.
Also with modern disk controllers and filesystems I'm not sure how much difference this should make. Both Reiser and XFS do a lot of internal buffering as does our disk controller. I guess I'll find out...
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
signature.asc
Description: OpenPGP digital signature
