Uwe, I think that Petko's question was about making sure that missing
values would be returned before non-missing values, even though some of
these non-missing values might be equal to Long.MIN_VALUE. Which isn't
possible today.

I agree with your recommendation against going with bytes given the
overhead in case of high cardinality.

On Mon, Nov 21, 2022 at 11:08 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort.
> In fact if you have Long.MIN_VALUE in your collection, empty values are
> treated the same, but still empty value will appear at the wanted place.
> In contrast to the default "0", it is not somewhere in the middle.
> Because there is no long that is smaller than Long.MIN_VALUE, the sort
> order will be OK.
>
> BTW, Apache Solr is using exactly those values to support missing values
> automatically (see sortMissingFirst, sortMissingLast schema options).
>
> In fact, string/bytes sorting has theoretically the same problem,
> because NULL is still different that empty. WARNING: If you really want
> to compare by byte[] as suggested in your last mail, keep in mind: When
> you sort against the raw bytes (using NumericUtils) with SORTED_SET
> docvalues type, there is a large overhead on indexing and sorting
> performance, especially for the case where you have many different
> values in your index (which is likely for numerics).
>
> Uwe
>
> Am 17.11.2022 um 08:47 schrieb Adrien Grand:
> > Hi Petko,
> >
> > Lucene's comparators for numerics have this limitation indeed. We haven't
> > got many questions around that in the past, which I would guess is due to
> > the fact that most numeric fields do not use the entire long range,
> > specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of these
> > works as a way to sort missing values first or last. If you have a field
> > that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a
> comparator
> > that can easily sort missing values first or last reliably out of the
> box.
> >
> > The easier option I can think of would consist of using the comparator
> for
> > longs with MIN_VALUE / MAX_VALUE for missing values depending on whether
> > you want missing values sorted first or last, and chain it with another
> > comparator (via a FieldComparatorSource) which would sort missing values
> > before/after existing values. The benefit of this approach is that you
> > would automatically benefit from some not-so-trivial features of Lucene's
> > comparator such as dynamic pruning.
> >
> > On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> When sorting documents by a NumericDocValuesField, how can documents be
> >> ordered such that those with missing values can come before anything
> else
> >> in ascending sorts? SortField allows to set a missing value:
> >>
> >>      var sortField = new SortField("price", SortField.Type.LONG);
> >>      sortField.setMissingValue(null);
> >>
> >> This null is however converted into a long 0 and documents with missing
> >> values are considered equally ordered with documents with an actual 0
> >> value. It's possible to set the missing value to Long.MIN_VALUE, but
> that
> >> will have the same problem, just for a different long value.
> >>
> >> Besides writing a custom comparator, is there any simpler and still
> >> performant way to achieve this sort?
> >>
> >> --Petko
> >>
> >
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

-- 
Adrien

Reply via email to