Uwe, I think that Petko's question was about making sure that missing values would be returned before non-missing values, even though some of these non-missing values might be equal to Long.MIN_VALUE. Which isn't possible today.
I agree with your recommendation against going with bytes given the overhead in case of high cardinality. On Mon, Nov 21, 2022 at 11:08 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort. > In fact if you have Long.MIN_VALUE in your collection, empty values are > treated the same, but still empty value will appear at the wanted place. > In contrast to the default "0", it is not somewhere in the middle. > Because there is no long that is smaller than Long.MIN_VALUE, the sort > order will be OK. > > BTW, Apache Solr is using exactly those values to support missing values > automatically (see sortMissingFirst, sortMissingLast schema options). > > In fact, string/bytes sorting has theoretically the same problem, > because NULL is still different that empty. WARNING: If you really want > to compare by byte[] as suggested in your last mail, keep in mind: When > you sort against the raw bytes (using NumericUtils) with SORTED_SET > docvalues type, there is a large overhead on indexing and sorting > performance, especially for the case where you have many different > values in your index (which is likely for numerics). > > Uwe > > Am 17.11.2022 um 08:47 schrieb Adrien Grand: > > Hi Petko, > > > > Lucene's comparators for numerics have this limitation indeed. We haven't > > got many questions around that in the past, which I would guess is due to > > the fact that most numeric fields do not use the entire long range, > > specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of these > > works as a way to sort missing values first or last. If you have a field > > that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a > comparator > > that can easily sort missing values first or last reliably out of the > box. > > > > The easier option I can think of would consist of using the comparator > for > > longs with MIN_VALUE / MAX_VALUE for missing values depending on whether > > you want missing values sorted first or last, and chain it with another > > comparator (via a FieldComparatorSource) which would sort missing values > > before/after existing values. The benefit of this approach is that you > > would automatically benefit from some not-so-trivial features of Lucene's > > comparator such as dynamic pruning. > > > > On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com> wrote: > > > >> Hello, > >> > >> When sorting documents by a NumericDocValuesField, how can documents be > >> ordered such that those with missing values can come before anything > else > >> in ascending sorts? SortField allows to set a missing value: > >> > >> var sortField = new SortField("price", SortField.Type.LONG); > >> sortField.setMissingValue(null); > >> > >> This null is however converted into a long 0 and documents with missing > >> values are considered equally ordered with documents with an actual 0 > >> value. It's possible to set the missing value to Long.MIN_VALUE, but > that > >> will have the same problem, just for a different long value. > >> > >> Besides writing a custom comparator, is there any simpler and still > >> performant way to achieve this sort? > >> > >> --Petko > >> > > > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Adrien