Skip data should be inlined into the postings lists
---------------------------------------------------

                 Key: LUCENE-2962
                 URL: https://issues.apache.org/jira/browse/LUCENE-2962
             Project: Lucene - Java
          Issue Type: Improvement
          Components: Index
            Reporter: Michael McCandless


Today, we store all skip data as a separate blob at the end of a given term's 
postings (if that term occurs in enough docs to warrant skip data).

But this adds overhead during decoding -- we have to seek to a different place 
for the initial load, we have to init separate readers, we have to seek again 
while using the lower levels of the skip data, etc.  Also, we have to fully 
decode all skip information even if we are not going to use it (eg if I only 
want docIDs, I still must decode position offset and lastPayloadLength).

If instead we interleaved skip data into the postings file, we could keep it 
local, and "private" to each file that needs skipping.  This should make it 
least costly to init and then use the skip data, which'd be a good perf gain 
for eg PhraseQuery, AndQuery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to