Thanks for the help! I didn't know about the performance issues with byte arrays vs numbers in this particular case.
On Mon, Nov 21, 2022 at 3:25 AM Adrien Grand <jpou...@gmail.com> wrote: > Uwe, I think that Petko's question was about making sure that missing > values would be returned before non-missing values, even though some of > these non-missing values might be equal to Long.MIN_VALUE. Which isn't > possible today. > > I agree with your recommendation against going with bytes given the > overhead in case of high cardinality. > > On Mon, Nov 21, 2022 at 11:08 AM Uwe Schindler <u...@thetaphi.de> wrote: > > > Hi, > > > > Long.MIN_VALUE and Long.MAX_VALUE are the correct way for longs to sort. > > In fact if you have Long.MIN_VALUE in your collection, empty values are > > treated the same, but still empty value will appear at the wanted place. > > In contrast to the default "0", it is not somewhere in the middle. > > Because there is no long that is smaller than Long.MIN_VALUE, the sort > > order will be OK. > > > > BTW, Apache Solr is using exactly those values to support missing values > > automatically (see sortMissingFirst, sortMissingLast schema options). > > > > In fact, string/bytes sorting has theoretically the same problem, > > because NULL is still different that empty. WARNING: If you really want > > to compare by byte[] as suggested in your last mail, keep in mind: When > > you sort against the raw bytes (using NumericUtils) with SORTED_SET > > docvalues type, there is a large overhead on indexing and sorting > > performance, especially for the case where you have many different > > values in your index (which is likely for numerics). > > > > Uwe > > > > Am 17.11.2022 um 08:47 schrieb Adrien Grand: > > > Hi Petko, > > > > > > Lucene's comparators for numerics have this limitation indeed. We > haven't > > > got many questions around that in the past, which I would guess is due > to > > > the fact that most numeric fields do not use the entire long range, > > > specifically Long.MIN_VALUE and Long.MAX_VALUE, so using either of > these > > > works as a way to sort missing values first or last. If you have a > field > > > that may use Long.MIN_VALUE and long.MAX_VALUE, we do not have a > > comparator > > > that can easily sort missing values first or last reliably out of the > > box. > > > > > > The easier option I can think of would consist of using the comparator > > for > > > longs with MIN_VALUE / MAX_VALUE for missing values depending on > whether > > > you want missing values sorted first or last, and chain it with another > > > comparator (via a FieldComparatorSource) which would sort missing > values > > > before/after existing values. The benefit of this approach is that you > > > would automatically benefit from some not-so-trivial features of > Lucene's > > > comparator such as dynamic pruning. > > > > > > On Wed, Nov 16, 2022 at 9:16 PM Petko Minkov <pmin...@gmail.com> > wrote: > > > > > >> Hello, > > >> > > >> When sorting documents by a NumericDocValuesField, how can documents > be > > >> ordered such that those with missing values can come before anything > > else > > >> in ascending sorts? SortField allows to set a missing value: > > >> > > >> var sortField = new SortField("price", SortField.Type.LONG); > > >> sortField.setMissingValue(null); > > >> > > >> This null is however converted into a long 0 and documents with > missing > > >> values are considered equally ordered with documents with an actual 0 > > >> value. It's possible to set the missing value to Long.MIN_VALUE, but > > that > > >> will have the same problem, just for a different long value. > > >> > > >> Besides writing a custom comparator, is there any simpler and still > > >> performant way to achieve this sort? > > >> > > >> --Petko > > >> > > > > > -- > > Uwe Schindler > > Achterdiek 19, D-28357 Bremen > > https://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > > -- > Adrien >