A little history may help...

(this is based on my bad memory, so it could all be wrong, nobody get offended):

At the time, lucene could only sort single valued fields. But solr and
elasticsearch would happily sort on multi-valued docs in various hacky
ways. And this typically entailed large amounts of memory to do it.
IMO, it was important to get docvalues working for most use-cases, but
this "sorting on multi-valued field" was a tricky one, because to me
it is MATHEMATICAL NONSENSE.

But it seemed nobody really cared about how the sorting worked (again
it is MATHEMATICALLY INSANE anyway), rather just, that users didn't
have to confess if their fields were single-valued or multi-valued. So
they did stuff like substitute min value for a forward sort, or max
value for a reverse sort. These selectors allow you to implement such
a sort if you want. Hopefully MIN is the default and common case, and
you only need MAX in the rare case someone clicks an arrow to reverse
the sort, as it requires consuming all the ordinals for each doc :)

On Tue, Oct 26, 2021 at 8:01 PM Robert Muir <rcm...@gmail.com> wrote:
>
> Hi Greg, I think the general issue is one of the API, the ValueSource
> seems really geared at returning values from single-valued fields.
>
> IMO, for the way the API is used (e.g. sorting), it makes sense to
> define a selector that works in O(1) time per-document, and use these
> existing valuesources:
>
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedIntFieldSource.java
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedLongFieldSource.java
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedFloatFieldSource.java
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/MultiValuedDoubleFieldSource.java
>
> These require that you specify a "selector" as to who will be the
> "stuckee" (designated value) for the doc:
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SortedNumericSelector.java
> I strongly recommend "min", as it can just read the first DV for each doc.
>
> For terms (strings), there is a similar thing:
>
> https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/SortedSetFieldSource.java
>
> And again, it has available selectors:
> https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/SortedSetSelector.java
> I would still strongly recommend "min", to just read the first DV for each 
> doc.
>
> On Tue, Oct 26, 2021 at 7:49 PM Greg Miller <gsmil...@gmail.com> wrote:
> >
> > Hi folks-
> >
> > Out of curiosity, is there a reason Lucene doesn't have
> > implementations for concepts like DoubleValues / DoubleValuesSource
> > that support multiple values per document? Or maybe something like
> > this does exist in Lucen that I'm not aware of? I can't believe this
> > hasn't been a topic of discussion at least once, but I couldn't turn
> > up a past Jira issue.
> >
> > I ask because most of the faceting implementations in Lucene allow the
> > user to provide their own xxValuesSource to use instead of assuming
> > the data is in an indexed field, but there's an inherent limitation
> > here forcing documents to have a single value. The faceting
> > implementations have all been updated to operate correctly for
> > multi-valued documents when referencing an indexed field, but there's
> > a bit of a gap here if the user wants to supply their own source.
> >
> > Many thanks!
> >
> > Cheers,
> > -Greg
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to