Thanks for your prompt reply! On Tue, Sep 10, 2024 at 1:38 PM Adrien Grand <jpou...@gmail.com> wrote:
> Can you clarify what you refer to by match-all and match-many queries? > Lucene's MatchAllDocsQuery should not be impacted since it doesn't use > postings for evaluation. > match-all refers to a query that hits all docs, e.g. a term query with term of "A", and every doc has a term "A". match-many refers to a query that hits a high percentage of the total docs. > > Since FOR is a bit less space-efficient than PFOR, I guess it could be a > bit slower if your Directory abstraction was a bit slow at reading data. > Are you using Lucene's MMapDirectory? > Yes, we use mmap for posting list index files. > > Elasticsearch indeed only retained PFOR for space-efficiency reasons. We > have many indexes that use IndexOptions.DOCS where the move from PFOR to > FOR significantly increased disk usage (unlike indexes that use > IndexOptions.DOCS_AND_FREQS_AND_POSITIONS where space is typically > dominated by positions anyway). > Got it. Thanks! > > On Tue, Sep 10, 2024 at 9:31 PM Rui Wu <rui...@mongodb.com.invalid> wrote: > > > Dear experts, > > > > I have a question about the following change: > > The Lucene9.11 changed the Posting list format > > (Lucene GITHUB#12696 <https://github.com/apache/lucene/pull/12696>: > Change > > Postings back to using FOR in Lucene99PostingsFormat. Freqs, positions > and > > offset keep using PFOR) > > > > However, in our (Mongodb Atlas Search) internal performance testing, we > saw > > an increase of query latency up to 32% on match-all and match-many > inverted > > index based queries, e.g. query.phrase-slop-0 and > > query.date-facet-match-all. > > > > > > I wonder if the community sees similar performance regressions on some > > queries for the Lucene99PostingsFormat. > > > > This ES PR <https://github.com/elastic/elasticsearch/pull/103601> > diverged > > from Lucene. Lucene 9.9 has introduced a new posting format that uses FOR > > instead of PFOR. Elasticsearch prefers the former format, therefore they > > introduce it as their own posting format here > > < > > > https://github.com/elastic/elasticsearch/tree/main/server/src/main/java/org/elasticsearch/index/codec/postings > > >. > > However, ES cited the reason as only being index size increase. > > > > Thank you very much! > > > > > -- > Adrien >