Re: optimal way to access many TermVectors

Adrien Grand Tue, 08 Oct 2013 00:52:27 -0700

Hi,

On Mon, Oct 7, 2013 at 9:31 PM, Rose, Stuart J <stuart.r...@pnnl.gov> wrote:
> Is there an optimal way to access many document TermVectors (in the same 
> chunk) consecutively when using the LZ4 termvector compression?
>
> I'm curious to know whether all TermVectors in a single compressed chunk are 
> decompressed and cached when one TermVector in the same chunk is accessed?


The main use-case for term vectors today being more-like-this and
highlighting, term vectors are generally accessed in no particular
order. This is why we don't cache the uncompressed chunk (it would
never get reused) so you need to decompress everytime you are
retrieving a document or its term vectors.

> Also wondering if there is a mapping of TermVector order to docID order? Or 
> is it always one to one? If docIds are dynamic, then presumably they are not 
> necessarily in the same order as their documents' corresponding term 
> vectors...

Term vectors are stored in doc ID order, meaning that for a given
segment, term vectors for document N are followed by term vectors for
document N+1.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: optimal way to access many TermVectors

Reply via email to