On Fri, Mar 27, 2009 at 12:39:05PM -0400, Michael McCandless wrote: > Why must merge policy be made public for realtime search? [In Lucy]
Because real-time search under Lucy needs to be able to operate using multiple write processes, since threads will not always be available. You need to be able to tell one indexer *not* to merge anything when performing fast updates, and you need to be able to tell another indexer what to merge when performing background consolidation. Looking down from a high level, what I think will work is to supply an "IndexManager" argument to the indexer's constructor which controls all merge-related behavior, and to provide FastUpdateManager and BackgroundMergeManager classes which implement the desired policies. > > Actually, if you're not warming sort caches, launching a Lucene IndexReader > > isn't obscenely expensive any more -- just expensive. Right? > > We load deleted docs on init (1 bit per doc = fast), terms index (= > alot of stuff every 128 terms = maybe slow), norms on the first search > that hits that field (1 byte per doc = probably OK), and FieldCache on > first search that uses it. So "it depends" I guess? For the purposes of MergePolicy, all you would need are the doc counts and the delcounts, and optionally other stuff in SegmentInfos. In theory you could lazy load the other stuff like the term dictionary index. Obviously that would be an unacceptable behavioral change, but it's worth noting. Marvin Humphrey --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org