[jira] [Commented] (LUCENE-2962) Skip data should be inlined into the postings lists

Han Jiang (JIRA) Tue, 23 Apr 2013 05:17:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13638989#comment-13638989
 ]


Han Jiang commented on LUCENE-2962:
-----------------------------------

And... sorry Mike, and sorry to all of you that I'm so hasty to hand in the 
proposal. 

I really would like to share my thoughts and discoveries with all of you. 
But for this issue as GSoC, I'm quite in doubt how much improvement we might 
gain finally. 
When interleaving skip data between docid/freq blocks, the performance loss on 
non-skip queries 
still seems to be unavoidable.  And the one-level-skipper experiment above 
shows that we should 
be really cautious if we're going to sacrifice simplicity and introduce a more 
complex structure 
of skip list. 

I'll be really grateful if someone can see further and take this issue :). But 
if this issue is still 
unassigned after GSoC days, I'll be very glad to do more experiment on it. :)
                
> Skip data should be inlined into the postings lists
> ---------------------------------------------------
>
>                 Key: LUCENE-2962
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2962
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: Michael McCandless
>              Labels: gsoc2013
>         Attachments: proposal.txt
>
>
> Today, we store all skip data as a separate blob at the end of a given term's 
> postings (if that term occurs in enough docs to warrant skip data).
> But this adds overhead during decoding -- we have to seek to a different 
> place for the initial load, we have to init separate readers, we have to seek 
> again while using the lower levels of the skip data, etc.  Also, we have to 
> fully decode all skip information even if we are not going to use it (eg if I 
> only want docIDs, I still must decode position offset and lastPayloadLength).
> If instead we interleaved skip data into the postings file, we could keep it 
> local, and "private" to each file that needs skipping.  This should make it 
> least costly to init and then use the skip data, which'd be a good perf gain 
> for eg PhraseQuery, AndQuery.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2962) Skip data should be inlined into the postings lists

Reply via email to