Thanks for your prompt reply!

On Tue, Sep 10, 2024 at 1:38 PM Adrien Grand <jpou...@gmail.com> wrote:

> Can you clarify what you refer to by match-all and match-many queries?
> Lucene's MatchAllDocsQuery should not be impacted since it doesn't use
> postings for evaluation.
>
match-all refers to a query that hits all docs, e.g. a term query with term
of "A", and every doc has a term "A". match-many refers to a query that
hits a high percentage of the total docs.

>
> Since FOR is a bit less space-efficient than PFOR, I guess it could be a
> bit slower if your Directory abstraction was a bit slow at reading data.
> Are you using Lucene's MMapDirectory?
>
Yes, we use mmap for posting list index files.

>
> Elasticsearch indeed only retained PFOR for space-efficiency reasons. We
> have many indexes that use IndexOptions.DOCS where the move from PFOR to
> FOR significantly increased disk usage (unlike indexes that use
> IndexOptions.DOCS_AND_FREQS_AND_POSITIONS where space is typically
> dominated by positions anyway).
>
Got it. Thanks!

>
> On Tue, Sep 10, 2024 at 9:31 PM Rui Wu <rui...@mongodb.com.invalid> wrote:
>
> > Dear experts,
> >
> > I have a question about the following change:
> > The Lucene9.11 changed the Posting list format
> > (Lucene GITHUB#12696 <https://github.com/apache/lucene/pull/12696>:
> Change
> > Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions
> and
> > offset keep using PFOR)
> >
> > However, in our (Mongodb Atlas Search) internal performance testing, we
> saw
> > an increase of query latency up to 32% on match-all and match-many
> inverted
> > index based queries, e.g. query.phrase-slop-0 and
> > query.date-facet-match-all.
> >
> >
> > I wonder if the community sees similar performance regressions on some
> > queries for the Lucene99PostingsFormat.
> >
> > This ES PR <https://github.com/elastic/elasticsearch/pull/103601>
> diverged
> > from Lucene. Lucene 9.9 has introduced a new posting format that uses FOR
> > instead of PFOR. Elasticsearch prefers the former format, therefore they
> > introduce it as their own posting format here
> > <
> >
> https://github.com/elastic/elasticsearch/tree/main/server/src/main/java/org/elasticsearch/index/codec/postings
> > >.
> > However, ES cited the reason as only being index size increase.
> >
> > Thank you very much!
> >
>
>
> --
> Adrien
>

Reply via email to