[
https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13698767#comment-13698767
]
Adrien Grand commented on LUCENE-5084:
--------------------------------------
bq. We could, but in which class? For example, in CachingWrapperFilter it might
be good to save memory, so it could be there.
This new doc id set might be used for other use-cases in the future, so maybe
we should have this method on the EliasFanoDocIdSet class?
bq. Also, would the expected size be the only thing to check for? When decoding
speed is also important, other DocIdSets might be preferable.
Sure, this is something we need to give users control on. For filter caches, it
is already possible to override CachingWrapperFilter.docIdSetToCache to decide
whether speed or memory usage is more important. The decision can even depend
on the cardinality of the set to cache or on its implementation. So we just
need to provide users with good defaults I think?
I haven't run performance benchmarks on this set implementation yet, but if it
is faster than the DocIdSets iterators of our default postings format, then
they are not going to be a bottleneck and I think it makes sense to use the
implementation that saves the most memory. If they are slower or not faster
enough, then maybe other implementations such as kamikaze's p-for-delta-based
doc ID sets (LUCENE-2750) would make more sense as a default.
bq. Can PackedInts.getMutable also be used in a codec?
The PackedInts API can return readers that can read directly from an IndexInput
if this is the question but if we want to be able to store high and low bits
contiguously then they are not going to be a good fit.
bq. I considered a decoder that returns ints but that would require a lot more
casting in the decoder.
OK. I just wanted to have your opinion on this, we can keep everything as a
long.
bq. I'll open another issue for broadword bit selection later.
Sounds good! I think backwards iteration and efficient skipping should be done
in separate issues as well, even without them this new doc ID set would be a
very nice addition.
> EliasFanoDocIdSet
> -----------------
>
> Key: LUCENE-5084
> URL: https://issues.apache.org/jira/browse/LUCENE-5084
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Paul Elschot
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: 5.0
>
> Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]