[jira] [Commented] (LUCENE-5084) EliasFanoDocIdSet

Adrien Grand (JIRA) Wed, 03 Jul 2013 02:12:30 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698767#comment-13698767
 ]


Adrien Grand commented on LUCENE-5084:
--------------------------------------

bq. We could, but in which class? For example, in CachingWrapperFilter it might 
be good to save memory, so it could be there.

This new doc id set might be used for other use-cases in the future, so maybe 
we should have this method on the EliasFanoDocIdSet class?

bq. Also, would the expected size be the only thing to check for? When decoding 
speed is also important, other DocIdSets might be preferable.

Sure, this is something we need to give users control on. For filter caches, it 
is already possible to override CachingWrapperFilter.docIdSetToCache to decide 
whether speed or memory usage is more important. The decision can even depend 
on the cardinality of the set to cache or on its implementation. So we just 
need to provide users with good defaults I think?

I haven't run performance benchmarks on this set implementation yet, but if it 
is faster than the DocIdSets iterators of our default postings format, then 
they are not going to be a bottleneck and I think it makes sense to use the 
implementation that saves the most memory. If they are slower or not faster 
enough, then maybe other implementations such as kamikaze's p-for-delta-based 
doc ID sets (LUCENE-2750) would make more sense as a default.

bq. Can PackedInts.getMutable also be used in a codec?

The PackedInts API can return readers that can read directly from an IndexInput 
if this is the question but if we want to be able to store high and low bits 
contiguously then they are not going to be a good fit.

bq. I considered a decoder that returns ints but that would require a lot more 
casting in the decoder.

OK. I just wanted to have your opinion on this, we can keep everything as a 
long.

bq. I'll open another issue for broadword bit selection later.

Sounds good! I think backwards iteration and efficient skipping should be done 
in separate issues as well, even without them this new doc ID set would be a 
very nice addition.
                
> EliasFanoDocIdSet
> -----------------
>
>                 Key: LUCENE-5084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5084) EliasFanoDocIdSet

Reply via email to