Hi,
Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort.
In fact if you have Long.MIN_VALUE in your collection, empty values are
treated the same, but still empty value will appear at the wanted place.
In contrast to the default "0", it is not somewhere in the middle.
Because there is no long that is smaller than Long.MIN_VALUE, the sort
order will be OK.
BTW, Apache Solr is using exactly those values to support missing values
automatically (see sortMissingFirst, sortMissingLast schema options).
In fact, string/bytes sorting has theoretically the same problem,
because NULL is still different that empty. WARNING: If you really want
to compare by byte[] as suggested in your last mail, keep in mind: When
you sort against the raw bytes (using NumericUtils) with SORTED_SET
docvalues type, there is a large overhead on indexing and sorting
performance, especially for the case where you have many different
values in your index (which is likely for numerics).
Uwe
Am 17.11.2022 um 08:47 schrieb Adrien Grand:
Hi Petko,
Lucene's comparators for numerics have this limitation indeed. We haven't
got many questions around that in the past, which I would guess is due to
the fact that most numeric fields do not use the entire long range,
specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of these
works as a way to sort missing values first or last. If you have a field
that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a comparator
that can easily sort missing values first or last reliably out of the box.
The easier option I can think of would consist of using the comparator for
longs with MIN_VALUE / MAX_VALUE for missing values depending on whether
you want missing values sorted first or last, and chain it with another
comparator (via a FieldComparatorSource) which would sort missing values
before/after existing values. The benefit of this approach is that you
would automatically benefit from some not-so-trivial features of Lucene's
comparator such as dynamic pruning.
On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com> wrote:
Hello,
When sorting documents by a NumericDocValuesField, how can documents be
ordered such that those with missing values can come before anything else
in ascending sorts? SortField allows to set a missing value:
var sortField = new SortField("price", SortField.Type.LONG);
sortField.setMissingValue(null);
This null is however converted into a long 0 and documents with missing
values are considered equally ordered with documents with an actual 0
value. It's possible to set the missing value to Long.MIN_VALUE, but that
will have the same problem, just for a different long value.
Besides writing a custom comparator, is there any simpler and still
performant way to achieve this sort?
--Petko
--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org