Can you clarify what you refer to by match-all and match-many queries?
Lucene's MatchAllDocsQuery should not be impacted since it doesn't use
postings for evaluation.

Since FOR is a bit less space-efficient than PFOR, I guess it could be a
bit slower if your Directory abstraction was a bit slow at reading data.
Are you using Lucene's MMapDirectory?

Elasticsearch indeed only retained PFOR for space-efficiency reasons. We
have many indexes that use IndexOptions.DOCS where the move from PFOR to
FOR significantly increased disk usage (unlike indexes that use
IndexOptions.DOCS_AND_FREQS_AND_POSITIONS where space is typically
dominated by positions anyway).

On Tue, Sep 10, 2024 at 9:31 PM Rui Wu <rui...@mongodb.com.invalid> wrote:

> Dear experts,
>
> I have a question about the following change:
> The Lucene9.11 changed the Posting list format
> (Lucene GITHUB#12696 <https://github.com/apache/lucene/pull/12696>: Change
> Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions and
> offset keep using PFOR)
>
> However, in our (Mongodb Atlas Search) internal performance testing, we saw
> an increase of query latency up to 32% on match-all and match-many inverted
> index based queries, e.g. query.phrase-slop-0 and
> query.date-facet-match-all.
>
>
> I wonder if the community sees similar performance regressions on some
> queries for the Lucene99PostingsFormat.
>
> This ES PR <https://github.com/elastic/elasticsearch/pull/103601> diverged
> from Lucene. Lucene 9.9 has introduced a new posting format that uses FOR
> instead of PFOR. Elasticsearch prefers the former format, therefore they
> introduce it as their own posting format here
> <
> https://github.com/elastic/elasticsearch/tree/main/server/src/main/java/org/elasticsearch/index/codec/postings
> >.
> However, ES cited the reason as only being index size increase.
>
> Thank you very much!
>


-- 
Adrien

Reply via email to