We did this monotonic detection/compression before in older times, but had to remove it because it caused too many slowdowns.
I think it easily causes too much type pollution, for example, for a typical large index with unsorted docvalues field, big segments aren't won't be sorted, tiny segments with a few values might happen to be sorted (depending on chance/luck), tiny tiny ones with e.g. a single document are sorted. Now we have a mix of monotonic and non-monotonic over the same field. On the other hand, optimization is very fragile and rare: even for these log users actually sorting on that field at index-time, it will just apply to one field out of the somehow typical dozens/hundreds that they like to have. But may destroy performance of all the other fields and overall causes more harm than good. On Tue, Jun 15, 2021 at 5:49 AM LuXugang <xugan...@icloud.com.invalid> wrote: > > Hi, > > In class Lucene80DocValuesConsumer#writeValues(FieldInfo field, > DocValuesProducer valuesProducer), all numericDocValues will be visited to > calculate gcd, in the meantime, we can check if all values were sorted. if > so, maybe we could use DirectMonotonicWriter to store them. > DirectMonotonicWriter can get impressive compression. > > In addition, when i use Elasticsearch to store numeric field types, in Lucene > level, the data always at least stored by > NumericDocValues/SortedNumericDocValues. So when indexing some sorted values > like ID, TIMESTAMP, maybe the upon optimization is applicable. > > Could I have some suggestions? > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org