Re: Interleaving and new Lucene formats

Robert Muir Sat, 16 Feb 2013 05:35:53 -0800

On Sat, Feb 16, 2013 at 8:19 AM, Sebastiano Vigna <vi...@di.unimi.it> wrote:
>
> I never asked for that. It looks like you're entirely missing my point.
> Which is to do a fair benchmark between radically different implementations
> of an index structure.


"It would also be important for me to force PForDelta everywhere"

>
>>
>> Thats right. Also keep in mind: in the FOR case the blocks themselves
>> are interleaved, so you have a block of 128 doc deltas, then a block
>> of 128 freqs follow, then 128 doc deltas again, then 128 freqs.
>> finally the vint remainder is docs+freqs interleaved as vints.
>
>
> OK, we are slowly getting there.
>
> So the question is: do you decode interleaved freqs blocks *always*, or do
> you do it *lazily* when freqs are actually used?

They are only decoded when they are present, and asked for up front in
the enumerator.
Currently scorers ask for these, because they use them for scoring.

If you want to remove frequencies from the equation, you can:

1. omit them completely at indexing time:

    FieldType ft = new FieldType(TextField.TYPE_NOT_STORED);
    ft.setIndexOptions(IndexOptions.DOCS_ONLY);
    Field field = new Field("body", "body contents", ft);

2. index them, but specify you won't ask for them in the DocsEnum: and
just use that to iterate documents.

      TermsEnum termsEnum = reader.terms("body").iterator(null);
      boolean found = termsEnum.seekExact(new BytesRef("dogs"), false);
      // pass 0, to not ask for frequencies
      DocsEnum docsEnum = termsEnum.docs(reader.getLiveDocs(), null, 0);

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Interleaving and new Lucene formats

Reply via email to