One more wrinkle for extremely large lists, is pass the list in as an
InputStream which is a presorted binary representation of the ASIN's and
slide a BytesRef across the stream and merge it with the SortedDocValues.
This saves on all the object creation and String overhead for really long
lists of id's.

Joel Bernstein
http://joelsolr.blogspot.com/


On Tue, Oct 26, 2021 at 4:50 PM Joel Bernstein <joels...@gmail.com> wrote:

> If the list of ASIN's is presorted you can quickly merge it with the
> SortedDocValues and produce a FixedBitSet of the top level ordinals, which
> can be used as the post filter. This is a nice approach for things like
> passing in a long list of access control predicates.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Tue, Oct 26, 2021 at 3:52 PM Adrien Grand <jpou...@gmail.com> wrote:
>
>> I opened https://issues.apache.org/jira/browse/LUCENE-10207 about these
>> ideas.
>>
>> On Tue, Oct 26, 2021 at 7:52 PM Robert Muir <rcm...@gmail.com> wrote:
>>
>>> On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand <jpou...@gmail.com> wrote:
>>> >
>>> > > And then we could make an IndexOrDocValuesQuery with both the
>>> TermInSetQuery and this SDV.newSlowInSetQuery?
>>> >
>>> > Unfortunately IndexOrDocValuesQuery relies on the fact that the
>>> "index" query can evaluate its cost (ScorerSupplier#cost) without doing
>>> anything costly, which isn't the case for TermInSetQuery.
>>> >
>>> > So we'd need to make some changes. Estimating the cost of a
>>> TermInSetQuery in general without seeking the terms is a hard problem, but
>>> maybe we could specialize the unique key case to return the number of terms
>>> as the cost?
>>>
>>> Yes we know each term in terms dict only has a single document, when
>>> terms.size() == terms.getSumDocFreq(): there's only one posting for
>>> each term.
>>> But we can probably generalize a cost estimation a bit more, just
>>> based on these two stats?
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
>> --
>> Adrien
>>
>

Reply via email to