Re: MergePolicy public but SegmentInfos package protected?

Marvin Humphrey Fri, 27 Mar 2009 10:13:10 -0700

On Fri, Mar 27, 2009 at 12:39:05PM -0400, Michael McCandless wrote:

> Why must merge policy be made public for realtime search? [In Lucy]


Because real-time search under Lucy needs to be able to operate using multiple
write processes, since threads will not always be available.

You need to be able to tell one indexer *not* to merge anything when
performing fast updates, and you need to be able to tell another indexer what
to merge when performing background consolidation.

Looking down from a high level, what I think will work is to supply an
"IndexManager" argument to the indexer's constructor which controls all
merge-related behavior, and to provide FastUpdateManager and
BackgroundMergeManager classes which implement the desired policies.

> > Actually, if you're not warming sort caches, launching a Lucene IndexReader
> > isn't obscenely expensive any more -- just expensive.  Right?
> 
> We load deleted docs on init (1 bit per doc = fast), terms index (=
> alot of stuff every 128 terms = maybe slow), norms on the first search
> that hits that field (1 byte per doc = probably OK), and FieldCache on
> first search that uses it.  So "it depends" I guess?

For the purposes of MergePolicy, all you would need are the doc counts and the
delcounts, and optionally other stuff in SegmentInfos.  In theory you could
lazy load the other stuff like the term dictionary index.  Obviously that
would be an unacceptable behavioral change, but it's worth noting.

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: MergePolicy public but SegmentInfos package protected?

Reply via email to