Thanks Adrien, this way of doing it makes sense! I suppose another option might be storing numbers in their byte array representations (maybe using https://lucene.apache.org/core/9_2_0/core/org/apache/lucene/util/NumericUtils.html#longToSortableBytes(long,byte%5B%5D,int) ) in SortedDocValues and then using a missing value of STRING_FIRST ( https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/search/SortField.html#STRING_FIRST)? Looks like TermOrdValComparator supports dynamic pruning too now, which you implemented.
On Wed, Nov 16, 2022 at 11:47 PM Adrien Grand <jpou...@gmail.com> wrote: > Hi Petko, > > Lucene's comparators for numerics have this limitation indeed. We haven't > got many questions around that in the past, which I would guess is due to > the fact that most numeric fields do not use the entire long range, > specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of these > works as a way to sort missing values first or last. If you have a field > that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a comparator > that can easily sort missing values first or last reliably out of the box. > > The easier option I can think of would consist of using the comparator for > longs with MIN_VALUE / MAX_VALUE for missing values depending on whether > you want missing values sorted first or last, and chain it with another > comparator (via a FieldComparatorSource) which would sort missing values > before/after existing values. The benefit of this approach is that you > would automatically benefit from some not-so-trivial features of Lucene's > comparator such as dynamic pruning. > > On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com> wrote: > > > Hello, > > > > When sorting documents by a NumericDocValuesField, how can documents be > > ordered such that those with missing values can come before anything else > > in ascending sorts? SortField allows to set a missing value: > > > > var sortField = new SortField("price", SortField.Type.LONG); > > sortField.setMissingValue(null); > > > > This null is however converted into a long 0 and documents with missing > > values are considered equally ordered with documents with an actual 0 > > value. It's possible to set the missing value to Long.MIN_VALUE, but that > > will have the same problem, just for a different long value. > > > > Besides writing a custom comparator, is there any simpler and still > > performant way to achieve this sort? > > > > --Petko > > > > > -- > Adrien >