Sorry, I don't think there is a need to use any top-level ordinals. none of these docvalues-based query implementations need it.
As far as query intersecting an input-stream, that is a big no-go. Lucene Queries need to have correct hashcode/equals/etc. That's why current stuff around this such as TermInSetQuery encode everything into a PrefixCodedTerms. On Tue, Oct 26, 2021 at 4:57 PM Joel Bernstein <joels...@gmail.com> wrote: > > One more wrinkle for extremely large lists, is pass the list in as an > InputStream which is a presorted binary representation of the ASIN's and > slide a BytesRef across the stream and merge it with the SortedDocValues. > This saves on all the object creation and String overhead for really long > lists of id's. > > Joel Bernstein > http://joelsolr.blogspot.com/ > > > On Tue, Oct 26, 2021 at 4:50 PM Joel Bernstein <joels...@gmail.com> wrote: >> >> If the list of ASIN's is presorted you can quickly merge it with the >> SortedDocValues and produce a FixedBitSet of the top level ordinals, which >> can be used as the post filter. This is a nice approach for things like >> passing in a long list of access control predicates. >> >> >> Joel Bernstein >> http://joelsolr.blogspot.com/ >> >> >> On Tue, Oct 26, 2021 at 3:52 PM Adrien Grand <jpou...@gmail.com> wrote: >>> >>> I opened https://issues.apache.org/jira/browse/LUCENE-10207 about these >>> ideas. >>> >>> On Tue, Oct 26, 2021 at 7:52 PM Robert Muir <rcm...@gmail.com> wrote: >>>> >>>> On Tue, Oct 26, 2021 at 1:37 PM Adrien Grand <jpou...@gmail.com> wrote: >>>> > >>>> > > And then we could make an IndexOrDocValuesQuery with both the >>>> > > TermInSetQuery and this SDV.newSlowInSetQuery? >>>> > >>>> > Unfortunately IndexOrDocValuesQuery relies on the fact that the "index" >>>> > query can evaluate its cost (ScorerSupplier#cost) without doing anything >>>> > costly, which isn't the case for TermInSetQuery. >>>> > >>>> > So we'd need to make some changes. Estimating the cost of a >>>> > TermInSetQuery in general without seeking the terms is a hard problem, >>>> > but maybe we could specialize the unique key case to return the number >>>> > of terms as the cost? >>>> >>>> Yes we know each term in terms dict only has a single document, when >>>> terms.size() == terms.getSumDocFreq(): there's only one posting for >>>> each term. >>>> But we can probably generalize a cost estimation a bit more, just >>>> based on these two stats? >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>> >>> >>> -- >>> Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org