Sridhar, We have been using approach 2 in our production system with good results. We have separate processes for indexing and searching. The main issue that came up was in deleting old indexes (see: *http://tinyurl.com/32q8c4*). Most of our production problems occur during indexing, and we are able to fix these without having to interrupt searching at all. This has been a real benefit.
Peter On Thu, Mar 6, 2008 at 5:30 AM, Sridhar Raman <[EMAIL PROTECTED]> wrote: > This is my situation. I have an index, which has a lot of search requests > coming into it. I use just a single instance of IndexSearcher to process > these requests. At the same time, this index is also getting updated by > an > IndexWriter. And I want these new changes to be reflected _only_ at > certain > intervals. I have thought of a few ways of doing this. Each has its > share > of problems and pluses. I would be glad if someone can help me in > figuring > out the right approach, especially from the performance point of view, as > the number of documents that will get indexed are pretty large. > > Approach 1: > Have just one copy of the index for both Search & Index. At time T, when > I > need to see the new changes reflected, I close the Searcher, and open it > again. > - The re-open of the Searcher might be a bit slow (which I could probably > solve by using some warm-up threads). > - Update and Search on the index at the same - will this affect the > performance? > - If server crashes before time T, the new Searcher would reflect the > changes, which is not acceptable. I want the changes to be reflected only > at time T. If server crashes, the index should be the previous T-1 index. > - Possible problems while optimising the index (as Search is also > happening). > + Just one copy of the index being stored. > > Approach 2: > Keep 2 copies of the index - 1 for Search, 1 for Index. At time T, I just > switch the Searcher to a copy of index that is being updated. > - Before I do the switch to the new index, I need to make a copy of it so > that the updates continue to happen on the other index. Is there a > convenient way to make this copy? Is it efficient? > - Time taken to create a new Searcher will still be a problem (but this is > a > problem in the previous approach as well, and we can live with it). > + Optimise can happen on an index that is not being read, as a result, its > resource requirements would be lesser. And probably even the speed of > optimisation. > + Faster search as the index update is happening on a different index. > > So, these are the 2 approaches I am contemplating about. Any pointers > which > would be the better approach? > > Thanks, > Sridhar >
