Hello everyone, We are in the process of upgrading from Lucene 8.5.0 and on the latest version our query performance tests show significant latency degradation for one of the important use cases. In this test, each query retrieves a relatively large dataset of 40k documents with a small stored fields payload (< 100 bytes per doc). It looks like the change which affects this use case was introduced in LUCENE-9486 <https://issues.apache.org/jira/browse/LUCENE-9486> (Lucene 8.7), on this version our tests show almost 3 times higher latency. Later in LUCENE-9917 <https://issues.apache.org/jira/browse/LUCENE-9917> block size for BEST_SPEED was reduced and since Lucene 8.10 we see about 30% degradation.
It is still a significant performance regression, and in our case query latency is more important than index size. Unless I'm missing something, the only way to fix that today is to introduce our own Codec, StoredFieldsFormat and CompressionMode - an experiment with disabled preset dict and lower block size showed that these changes allow to achieve query latency we need on Lucene 9.2. While it can solve the problem, there is a concern about maintaining our own version of the codec and having more complicated upgrades in the future. Are there any less obvious ways to improve the situation for this use case? If not, does it make sense to expose related settings so users can tune the compression without copying several internal classes? Thank you, Alex
