I believe that this sort of optimization would be more effective and robust if we made doc values look more like postings, with relatively small blocks of values that would get compressed independently and decompressed in bulk. This way, we wouldn't require data to be sorted across entire segments for this optimization to kick in, and we would be less likely to slow down the normal case.
On Tue, Jun 15, 2021 at 12:06 PM Robert Muir <rcm...@gmail.com> wrote: > We did this monotonic detection/compression before in older times, but > had to remove it because it caused too many slowdowns. > > I think it easily causes too much type pollution, for example, for a > typical large index with unsorted docvalues field, big segments aren't > won't be sorted, tiny segments with a few values might happen to be > sorted (depending on chance/luck), tiny tiny ones with e.g. a single > document are sorted. Now we have a mix of monotonic and non-monotonic > over the same field. > > On the other hand, optimization is very fragile and rare: even for > these log users actually sorting on that field at index-time, it will > just apply to one field out of the somehow typical dozens/hundreds > that they like to have. But may destroy performance of all the other > fields and overall causes more harm than good. > > On Tue, Jun 15, 2021 at 5:49 AM LuXugang <xugan...@icloud.com.invalid> > wrote: > > > > Hi, > > > > In class Lucene80DocValuesConsumer#writeValues(FieldInfo field, > DocValuesProducer valuesProducer), all numericDocValues will be visited to > calculate gcd, in the meantime, we can check if all values were sorted. if > so, maybe we could use DirectMonotonicWriter to store them. > DirectMonotonicWriter can get impressive compression. > > > > In addition, when i use Elasticsearch to store numeric field types, in > Lucene level, the data always at least stored by > NumericDocValues/SortedNumericDocValues. So when indexing some sorted > values like ID, TIMESTAMP, maybe the upon optimization is applicable. > > > > Could I have some suggestions? > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > > -- Adrien