Hey folks,

While investigating a regression in OpenSearch versions 2.17.1 ( Lucene
9.11.1 ) and 2.18.0 ( Lucene 9.12.0 ) for simple Term Query in Big5
workload over process.name field, I noticed that the new
Lucene912PostingsReader creates the ImpactsEnum by wrapping SlowImpactsEnum
over postings when a field only has IndexOptions.DOCS

curl -X POST "http://localhost:9200/big5/_search"; -H "Content-Type:
application/json" -d '{ "query": { "term": { "process.name": "kernel" } } }'


Lucene912PostingsReader ->> ImpactsEnum impacts(FieldInfo fieldInfo,
BlockTermState state, int flags) Has an extra check on *indexHasFreqs*

if (state.docFreq >= BLOCK_SIZE
    && indexHasFreqs
    && (indexHasPositions == false
        || PostingsEnum.featureRequested(flags,
PostingsEnum.POSITIONS) == false)) {
  return new BlockImpactsDocsEnum(fieldInfo, (IntBlockTermState) state);
}


Whereas Lucene99PostingsReader creates the faster BlockImpactsDocsEnum for
fields with IndexOptions.DOCS and only creates the SlowImpactsEnum when
document frequency is less than 128 ( Block size )


Lucene99PostingsReader ->> ImpactsEnum impacts(FieldInfo fieldInfo,
BlockTermState state, int flags)

if (state.docFreq <= BLOCK_SIZE) {
  // no skip data
  return new SlowImpactsEnum(postings(fieldInfo, state, null, flags));
}


if (indexHasPositions == false
    || PostingsEnum.featureRequested(flags, PostingsEnum.POSITIONS) == false) {
  return new BlockImpactsDocsEnum(fieldInfo, (IntBlockTermState) state);
}



Since Lucene 9.12.0 wraps a SlowImpactsEnum which has a no-op for
advanceShallow method, the Term Query is never able to skip data when
called from the bulk scorer via DISI#nextDoc() Whereas the advanceShallow
gets used in Lucene 9.11.1 and skips over a lot of docs resulting in faster
completion.
The difference with 116 million docs of Big5 index is >200ms in Lucene
9.12.0 to <=5ms in Lucene 9.11.1

I tried reindexing the process.name into another index but with
docs_and_freqs enabled and the query latency came back to normal since it
uses BlockImpactsDocsEnum as its ImpactsEnum.

Is this a bug in the 912 postings reader ? Or is it not possible to use the
BlockImpactsDocsEnum with the new postings format ?


Thanks,
Aniketh

Reply via email to