Thanks Adrien, this way of doing it makes sense! I suppose another option
might be storing numbers in their byte array representations (maybe using
https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/util/NumericUtils.html#longToSortableBytes(long,byte%5B%5D,int)
) in SortedDocValues and then using a missing value of STRING_FIRST (
https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/search/SortField.html#STRING_FIRST)?
Looks like TermOrdValComparator supports dynamic pruning too now, which you
implemented.

On Wed, Nov 16, 2022 at 11:47 PM Adrien Grand <jpou...@gmail.com> wrote:

> Hi Petko,
>
> Lucene's comparators for numerics have this limitation indeed. We haven't
> got many questions around that in the past, which I would guess is due to
> the fact that most numeric fields do not use the entire long range,
> specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of these
> works as a way to sort missing values first or last. If you have a field
> that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a comparator
> that can easily sort missing values first or last reliably out of the box.
>
> The easier option I can think of would consist of using the comparator for
> longs with MIN_VALUE / MAX_VALUE for missing values depending on whether
> you want missing values sorted first or last, and chain it with another
> comparator (via a FieldComparatorSource) which would sort missing values
> before/after existing values. The benefit of this approach is that you
> would automatically benefit from some not-so-trivial features of Lucene's
> comparator such as dynamic pruning.
>
> On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com> wrote:
>
> > Hello,
> >
> > When sorting documents by a NumericDocValuesField, how can documents be
> > ordered such that those with missing values can come before anything else
> > in ascending sorts? SortField allows to set a missing value:
> >
> >     var sortField = new SortField("price", SortField.Type.LONG);
> >     sortField.setMissingValue(null);
> >
> > This null is however converted into a long 0 and documents with missing
> > values are considered equally ordered with documents with an actual 0
> > value. It's possible to set the missing value to Long.MIN_VALUE, but that
> > will have the same problem, just for a different long value.
> >
> > Besides writing a custom comparator, is there any simpler and still
> > performant way to achieve this sort?
> >
> > --Petko
> >
>
>
> --
> Adrien
>

Reply via email to