Re: MergePolicy public but SegmentInfos package protected?

Marvin Humphrey Thu, 26 Mar 2009 18:52:00 -0700

On Thu, Mar 26, 2009 at 07:06:26AM -0400, Michael McCandless wrote:

> We'd need to add a few methods to IndexReader,


Eep.  IndexReader's too big as it is.  

> eg querying whether
> compound file format is in use, whether separate norms are stored,
> "get me total size in bytes of all files" (or maybe just "get me all
> files", plus utility method somewhere to add up the sizes), so this
> approach seems doable.

Do you really need all that?  I think the crucial info is already available:

  * The number of docs in each segment.
  * The number of deletions in each segment, allowing you to calculate the
    deletion percentage.

I think it's reasonable to assume an average distribution of document sizes
across segments.  Sure, that'll be wrong at the long tail of the curve, but
most of the time it will be right -- and even when it's not, it won't cause
big problems.

> But: we don't yet have IndexWriter holding open a reader for every
> segment.  We are working on realtime search (LUCENE-1516), but even
> then, if you don't ask for a realtime reader from IndexWriter, it
> won't hold open SegmentReaders for all segments.

Yeah, that's gonna be a bigger problem.  :(  It's cake to give Lucy's indexer
a reader, because opening readers is cheap.  But the Lucene heavy-IndexReader
model messes that up -- IndexWriter has traditionally been a fast class to
open.

Marvin Humphrey


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: MergePolicy public but SegmentInfos package protected?

Reply via email to