Hi, We are using lucene to index a large number of documents (millions) and we currently optimize half the index in the background every 2 days, to stop it becoming too fragmented. This takes about an hour and we are finding during this time searches are slowed down dramatically on that machine. This is not due to CPU as it is a dual CPU box, so I'm thinking it must be the large amounts of IO being used to optimize the index.
I was wondering if anyone has any ideas for alleviating this problem? One option I've come up with is to slowly copy the index to a second second offline box, optimize there and then slowly copy the newly optimized index back onto the search box. To slow down the IO so that bandwidth and IO are not maxed out I thought I could use something like the linux Traffic Control (tc) program http://tldp.org/HOWTO/Traffic-Control-HOWTO/elements.html#e-shaping (see also http://gentoo-wiki.com/HOWTO_Apache_2_bandwidth_limiting ) or tar, nfs and http://www.ivarch.com/programs/quickref/pv.shtml and its rate-limit option to limit how quickly the index directory is copied to and from the remote machine. This option doesn't seem ideal as it would involve other programs, servers and scripts. The other option is to do it all within the existing Java program, by rate-limiting/throttling the IO of the lucene Directory being used to do the optimize. I've done this in Lucene by extending the FSDirectory and the FSIndexOutput classes and putting a small sleep in the FSIndexOutput.flushBuffer, and it seems to work OK. I'm not that keen on copying and modifying lucene code though, because I'll have to check and possibly modify it every time I upgrade lucene, so if there is a reasonable alternative I'd be interested in hearing anyone's ideas? If people think IO throttling FSDirectory may be a good idea and useful for them, I could develop it more and possibly contact lucene-dev to look into getting it added to the lucene trunk? Cheers steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]