Hi Greg, I think the general issue is one of the API, the ValueSource seems really geared at returning values from single-valued fields.
IMO, for the way the API is used (e.g. sorting), it makes sense to define a selector that works in O(1) time per-document, and use these existing valuesources: https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedIntFieldSource.java https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedLongFieldSource.java https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedFloatFieldSource.java https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedDoubleFieldSource.java These require that you specify a "selector" as to who will be the "stuckee" (designated value) for the doc: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SortedNumericSelector.java I strongly recommend "min", as it can just read the first DV for each doc. For terms (strings), there is a similar thing: https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/SortedSetFieldSource.java And again, it has available selectors: https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java I would still strongly recommend "min", to just read the first DV for each doc. On Tue, Oct 26, 2021 at 7:49 PM Greg Miller <[email protected]> wrote: > > Hi folks- > > Out of curiosity, is there a reason Lucene doesn't have > implementations for concepts like DoubleValues / DoubleValuesSource > that support multiple values per document? Or maybe something like > this does exist in Lucen that I'm not aware of? I can't believe this > hasn't been a topic of discussion at least once, but I couldn't turn > up a past Jira issue. > > I ask because most of the faceting implementations in Lucene allow the > user to provide their own xxValuesSource to use instead of assuming > the data is in an indexed field, but there's an inherent limitation > here forcing documents to have a single value. The faceting > implementations have all been updated to operate correctly for > multi-valued documents when referencing an indexed field, but there's > a bit of a gap here if the user wants to supply their own source. > > Many thanks! > > Cheers, > -Greg > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
