Hi, On Mon, Oct 7, 2013 at 9:31 PM, Rose, Stuart J <stuart.r...@pnnl.gov> wrote: > Is there an optimal way to access many document TermVectors (in the same > chunk) consecutively when using the LZ4 termvector compression? > > I'm curious to know whether all TermVectors in a single compressed chunk are > decompressed and cached when one TermVector in the same chunk is accessed?
The main use-case for term vectors today being more-like-this and highlighting, term vectors are generally accessed in no particular order. This is why we don't cache the uncompressed chunk (it would never get reused) so you need to decompress everytime you are retrieving a document or its term vectors. > Also wondering if there is a mapping of TermVector order to docID order? Or > is it always one to one? If docIds are dynamic, then presumably they are not > necessarily in the same order as their documents' corresponding term > vectors... Term vectors are stored in doc ID order, meaning that for a given segment, term vectors for document N are followed by term vectors for document N+1. -- Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org