Thanks, Robert, Adrien. your replies are helpful to me > 2021年6月15日 下午10:19,Robert Muir <rcm...@gmail.com> 写道: > > Well it definitely wouldn't be as useful as changing to a > postings-style approach. That would bring a lot more benefits to > general cases, e.g. use of PFOR and so on. > > But it is also easier to implement right now, to accelerate cases > where fields are sorted, without hurting other things. > > On Tue, Jun 15, 2021 at 9:53 AM Adrien Grand <jpou...@gmail.com> wrote: >> >> SegmentWriteState has a reference to SegmentInfos which itself has the index >> sort, so I believe that it would be possible. >> >> I wonder how useful it would be in practice. E.g. in the Elasticsearch case, >> even though we store lots of time-based data and have been looking into >> index sorting for storage/query efficiency reasons, the index sorts that we >> are interested in in practice look more like `host.name ASC, @timestamp >> DESC` than just `@timestamp DESC`. The reason for sorting by `host` first is >> that it helps a lot with storage/query efficiency of metadata that is tied >> to the host (e.g. IP addresses, operating system, etc.), and then because >> `host.name` is usually a low-cardinality field, queries by descending >> timestamp remain super efficient thanks to LUCENE-9280. So we'd be more >> interested in an optimization that would support piecewise monotonic fields. >> >> On Tue, Jun 15, 2021 at 3:33 PM Robert Muir <rcm...@gmail.com> wrote: >>> >>> +1 to that idea. Maybe a shorter-term possibility would be to only do >>> this compression on a field when the user has explicitly configured >>> index sorting on the field (can we hackishly peek at it and tell?) >>> >>> On Tue, Jun 15, 2021 at 9:04 AM Adrien Grand <jpou...@gmail.com> wrote: >>>> >>>> I believe that this sort of optimization would be more effective and >>>> robust if we made doc values look more like postings, with relatively >>>> small blocks of values that would get compressed independently and >>>> decompressed in bulk. This way, we wouldn't require data to be sorted >>>> across entire segments for this optimization to kick in, and we would be >>>> less likely to slow down the normal case. >>>> >>>> On Tue, Jun 15, 2021 at 12:06 PM Robert Muir <rcm...@gmail.com> wrote: >>>>> >>>>> We did this monotonic detection/compression before in older times, but >>>>> had to remove it because it caused too many slowdowns. >>>>> >>>>> I think it easily causes too much type pollution, for example, for a >>>>> typical large index with unsorted docvalues field, big segments aren't >>>>> won't be sorted, tiny segments with a few values might happen to be >>>>> sorted (depending on chance/luck), tiny tiny ones with e.g. a single >>>>> document are sorted. Now we have a mix of monotonic and non-monotonic >>>>> over the same field. >>>>> >>>>> On the other hand, optimization is very fragile and rare: even for >>>>> these log users actually sorting on that field at index-time, it will >>>>> just apply to one field out of the somehow typical dozens/hundreds >>>>> that they like to have. But may destroy performance of all the other >>>>> fields and overall causes more harm than good. >>>>> >>>>> On Tue, Jun 15, 2021 at 5:49 AM LuXugang <xugan...@icloud.com.invalid> >>>>> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> In class Lucene80DocValuesConsumer#writeValues(FieldInfo field, >>>>>> DocValuesProducer valuesProducer), all numericDocValues will be visited >>>>>> to calculate gcd, in the meantime, we can check if all values were >>>>>> sorted. if so, maybe we could use DirectMonotonicWriter to store them. >>>>>> DirectMonotonicWriter can get impressive compression. >>>>>> >>>>>> In addition, when i use Elasticsearch to store numeric field types, in >>>>>> Lucene level, the data always at least stored by >>>>>> NumericDocValues/SortedNumericDocValues. So when indexing some sorted >>>>>> values like ID, TIMESTAMP, maybe the upon optimization is applicable. >>>>>> >>>>>> Could I have some suggestions? >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>>> >>>> >>>> -- >>>> Adrien >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >> >> >> -- >> Adrien > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org