Optimization is disk bound -- it will read the whole index and write it back. If the 7 minute it took to optimize your index is not acceptable, get a faster hard-drive (fast RPM, seek, etc.)
Btw, 3000 documents is small, but if they *all* (or most) are being updated every 3-5 minutes, you will run into fragmentation issues (and many segment files) as your discovered. -- George -----Original Message----- From: Dean Harding [mailto:[email protected]] Sent: Tuesday, June 30, 2009 7:03 PM To: [email protected] Subject: RE: 40000 segments for index with 2000 documents > There are about 3000 documents with one field indexed that are being > updated 3-5 times per minute. It looks like new segment created per > each transaction because right now there are about 40000 .cfs/.del > (coupled) files which makes 80000 files in index and indexs size is > about 25Mb. But after optimization (which took 7 minutes) index size > shrunk to 350Kb. So what's the performance like after optimization? Optimization doesn't happen automatically in Lucene you must do it manually. Adding a document simply appends it to the end of the index and removing a document simply marks it as deleted. Updating a document is a remove-then-add operation. It's only when you call Optimize() that it actually rearranges things on disk for faster access, and that's something you should be doing on a regular basis. Here, we do an Optimize() after every 1000 "modifications" (add, delete, update). For a relatively small index like yours, regular optimization shouldn't take more than a couple of seconds (it's only because you let things go so out of hand that it took 7 minutes) and you can continue to query the index while the optimization is happening. At least, that's always been my understanding. Dean.
