On Thu, Dec 16, 2021 at 1:31 PM Robert Muir <[email protected]> wrote: > > On Thu, Dec 16, 2021 at 3:53 PM Greg Miller <[email protected]> wrote: > > > > > TaxonomyReader was recently updated > > to support bulk ordinal resolution (LUCENE-9476), but SSDV faceting is > > stuck looking up paths one-at-a-time via SSDV#lookupOrd(ord). This > > results in a separate TermsEnum#seekExact() call down in > > Lucene90DocValuesProducer for each ordinal being returned. > > > > I'm confused, where do we do gazillions of lookupOrd(), we should not > be doing that. The ordinals should be used for all the heavy-duty > work, and at the very end, only the top-10 or whatever resolved back > to strings with lookupOrd. Think of it kinda like the stored fields :)
This is right, but we still need to do the lookup for each value being returned (which is bounded by the top-n param supplied by the user). In getAllDims, we'll do "n" lookups for every dimension indexed. So while we're working in "ordinal space" for doing all the counting and such, there could still be a somewhat sizable number of ordinals that need to be looked up after counting. This is where taxo-faceting leans on bulk lookups. We also call lookupOrd for _every_ ordinal in the given field when building the state (see the ctor logic in DefaultSortedSetDocValuesReaderState). I'm not as concerned about this since state building only needs to happen when the index changes. > > > Having no knowledge about the actual data representation behind the > > TermsDict in an SSDV field, I'm wondering if someone here can provide > > a high-level sense of whether-or-not there might be an advantage to > > looking up ordinals in bulk. I'm going to dig into the code anyway > > (curious!), but thought I'd raise the idea/question here as well > > regarding whether-or-not a bulk lookup might be advantageous in > > general for SSDV fields. Any thoughts? > > I don't think we should provide such an API, because the operation is > slow and should not be done in "bulk" anyway. Number of lookups should > be low (e.g. 10, 50, whatever the user's top-N is). If you want to > optimize it, sort them in ascending order and look that up first, but > honestly in most cases, that probably isn't even worth it. That's fair. I can see the argument for not wanting to encourage unnecessary lookups with a "bulk" operation. Thanks for the feedback. I'll think about this a little more when I have some time to dig into the code, but what you're saying sounds reasonable. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
