Thanks for the help! I didn't know about the performance issues with byte
arrays vs numbers in this particular case.

On Mon, Nov 21, 2022 at 3:25 AM Adrien Grand <jpou...@gmail.com> wrote:

> Uwe, I think that Petko's question was about making sure that missing
> values would be returned before non-missing values, even though some of
> these non-missing values might be equal to Long.MIN_VALUE. Which isn't
> possible today.
>
> I agree with your recommendation against going with bytes given the
> overhead in case of high cardinality.
>
> On Mon, Nov 21, 2022 at 11:08 AM Uwe Schindler <u...@thetaphi.de> wrote:
>
> > Hi,
> >
> > Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort.
> > In fact if you have Long.MIN_VALUE in your collection, empty values are
> > treated the same, but still empty value will appear at the wanted place.
> > In contrast to the default "0", it is not somewhere in the middle.
> > Because there is no long that is smaller than Long.MIN_VALUE, the sort
> > order will be OK.
> >
> > BTW, Apache Solr is using exactly those values to support missing values
> > automatically (see sortMissingFirst, sortMissingLast schema options).
> >
> > In fact, string/bytes sorting has theoretically the same problem,
> > because NULL is still different that empty. WARNING: If you really want
> > to compare by byte[] as suggested in your last mail, keep in mind: When
> > you sort against the raw bytes (using NumericUtils) with SORTED_SET
> > docvalues type, there is a large overhead on indexing and sorting
> > performance, especially for the case where you have many different
> > values in your index (which is likely for numerics).
> >
> > Uwe
> >
> > Am 17.11.2022 um 08:47 schrieb Adrien Grand:
> > > Hi Petko,
> > >
> > > Lucene's comparators for numerics have this limitation indeed. We
> haven't
> > > got many questions around that in the past, which I would guess is due
> to
> > > the fact that most numeric fields do not use the entire long range,
> > > specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of
> these
> > > works as a way to sort missing values first or last. If you have a
> field
> > > that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a
> > comparator
> > > that can easily sort missing values first or last reliably out of the
> > box.
> > >
> > > The easier option I can think of would consist of using the comparator
> > for
> > > longs with MIN_VALUE / MAX_VALUE for missing values depending on
> whether
> > > you want missing values sorted first or last, and chain it with another
> > > comparator (via a FieldComparatorSource) which would sort missing
> values
> > > before/after existing values. The benefit of this approach is that you
> > > would automatically benefit from some not-so-trivial features of
> Lucene's
> > > comparator such as dynamic pruning.
> > >
> > > On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com>
> wrote:
> > >
> > >> Hello,
> > >>
> > >> When sorting documents by a NumericDocValuesField, how can documents
> be
> > >> ordered such that those with missing values can come before anything
> > else
> > >> in ascending sorts? SortField allows to set a missing value:
> > >>
> > >>      var sortField = new SortField("price", SortField.Type.LONG);
> > >>      sortField.setMissingValue(null);
> > >>
> > >> This null is however converted into a long 0 and documents with
> missing
> > >> values are considered equally ordered with documents with an actual 0
> > >> value. It's possible to set the missing value to Long.MIN_VALUE, but
> > that
> > >> will have the same problem, just for a different long value.
> > >>
> > >> Besides writing a custom comparator, is there any simpler and still
> > >> performant way to achieve this sort?
> > >>
> > >> --Petko
> > >>
> > >
> > --
> > Uwe Schindler
> > Achterdiek 19, D-28357 Bremen
> > https://www.thetaphi.de
> > eMail: u...@thetaphi.de
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
> >
>
> --
> Adrien
>

Reply via email to