On Sat, Nov 29, 2008 at 12:45 PM, Michael Stoppelman <[EMAIL PROTECTED]> wrote: > Hi all, > > I've got an indexing issue I think other folks might be interested in > hearing about and I wanted to get feedback before I went ahead and > implemented a new method. > > Currently, the way we update indices is by sending individual delete/add > document requests to all our search boxes individually. Each box is doing > about 20-30qps while this is happening. The problem I'm seeing is that when > a segment from the index is merged [honestly I don't know that much about > segment merging] (our merge factor is set to 5) and an old highly used > segment of the index is lost from the disk cache; most of the search > requests to that box get prohibitively slow 10-80+ secs and I see pg/in + > pg/out stats spike sar.
> I'm planning on implementing a method similar to the > SOLR model using the rsync method that Doug Cutting outlined a long time ago > on this list and forcing the new files into the disk cache using fadvice. FYI, if you don't actually want to use rsync, Solr has very recently implemented this in pure Java. But forcing new files into the cache means ejecting currently used files from the cache (for queries currently in flight). Doesn't seem any way to really win here if you don't have enough memory to hold the critical parts of both indexes. I think Mike pointed out in a recent post that it would be nice to advise the kernel not to cache files being written during merging. In Solr, we would want to allow the same thing when copying over new segment files during replication. Unfortunately, I don't think there is a way to do this through Java. -Yonik > Is there another strategy here? Could I create a merge policy that forces > new segments into the disk cache before lucene nukes the old ones? > > Thanks, > M --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]