Re: Per-document Payloads

Nicolas Lalevée Tue, 30 Oct 2007 05:30:20 -0800

Le lundi 29 octobre 2007, Michael McCandless a écrit :
> "Michael Busch" <[EMAIL PROTECTED]> wrote:
> > Michael McCandless wrote:
> > > Michael, are you thinking that the storage would/could be non-sparse
> > > (like norms), and loaded/cached once in memory, especially for fixed
> > > size fields?  EG a big array of ints of length maxDocID?  In John's
> > > original case, every doc has this UID int field; I think this is
> > > fairly common.
> >
> > Yes I agree, this is a common use case. In my first mail in this thread
> > I suggested to have a flexible format. Non-sparse, like norms, in case
> > every document has one value and all values have the same fixed size.
> > Sparse and with a skip list if one or both conditions are false.
> >
> > The DocumentsWriter would have to check whether both conditions are
> > true, in  which case it would store the values non-sparse. The
> > SegmentMerger would only write the non-sparse format for the new segment
> > if all of the source segments also had the non-sparse format with the
> > same value size.
> >
> > This would provide the most flexibility for the users I think.
>
> OK, got it.  So in the case where I always put a field "UID" on every
> document, always a 4-byte binary field, then Lucene will "magically"
> store this as non-sparse column-stride field for every segment.
>
> But I still have to mark the Field as "column-stride storage" right?


It depends how the API should look like. Either Lucene support every different 
format support, so you have explicitely bind fileds with a format, either you 
open up the API so it is the Lucene user who choose how to store its data.

As said earlier in the thread, some work have done done against the second 
choice : LUCENE-662
 
Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Per-document Payloads

Reply via email to