We did this monotonic detection/compression before in older times, but
had to remove it because it caused too many slowdowns.

I think it easily causes too much type pollution, for example, for a
typical large index with unsorted docvalues field, big segments aren't
won't be sorted, tiny segments with a few values might happen to be
sorted (depending on chance/luck), tiny tiny ones with e.g. a single
document are sorted. Now we have a mix of monotonic and non-monotonic
over the same field.

On the other hand, optimization is very fragile and rare: even for
these log users actually sorting on that field at index-time, it will
just apply to one field out of the somehow typical dozens/hundreds
that they like to have. But may destroy performance of all the other
fields and overall causes more harm than good.

On Tue, Jun 15, 2021 at 5:49 AM LuXugang <xugan...@icloud.com.invalid> wrote:
>
> Hi,
>
> In class Lucene80DocValuesConsumer#writeValues(FieldInfo field, 
> DocValuesProducer valuesProducer), all numericDocValues will be visited to 
> calculate gcd, in the meantime,  we can check if all values were sorted. if 
> so, maybe we could use DirectMonotonicWriter to store them.  
> DirectMonotonicWriter can get impressive compression.
>
> In addition, when i use Elasticsearch to store numeric field types, in Lucene 
> level,  the data always at least stored by 
> NumericDocValues/SortedNumericDocValues. So when indexing some sorted values 
> like ID, TIMESTAMP, maybe the upon optimization is applicable.
>
> Could I have some suggestions?
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to