Index size and performance degradation

Itamar Syn-Hershko Sat, 11 Jun 2011 12:39:16 -0700

Hi all,

I know Lucene indexes to be at their optimum up to a certain size - saidto be around several GBs. I haven't found a good discussion over this,but its my understanding that at some point its better to split an indexinto parts (a la sharding) than to continue searching on a huge-sizeindex. I assume this has to do with OS and IO configurations. Can anyonepoint me to more info on this?

We have a product that is using Lucene for various searches, and at themoment each type of search is using its own Lucene index. We plan onrefactoring the way it works and to combine all indexes into one -making the whole system more robust and with a smaller memory footprint,among other things.

Assuming the above is true, we are interested in knowing how to do thiscorrectly. Initially all our indexes will be run in one big index, butif at some index size there is a severe performance degradation we wouldlike to handle that correctly by starting a new FSDirectory index toflush into, or by re-indexing and moving large indexes into their ownLucene index.

Are there are any guidelines for measuring or estimating this correctly?what we should be aware of while considering all that? We can't assumeanything about the machine running it, so testing won't really tell usmuch...


Thanks in advance for any input on this,

Itamar.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Index size and performance degradation

Reply via email to