Re: [PR] Bump block size of postings to 256. [lucene]

via GitHub Fri, 05 Sep 2025 10:50:49 -0700


jpountz commented on PR #15160:
URL: https://github.com/apache/lucene/pull/15160#issuecomment-3258969230


   Even though 512 performs better on benchmarks, I'm leaning towards going 
with 256:
    - memory usage should be taken into account as well (each PostingsEnum 
maintains 3 arrays size based on this block size: doc IDs, term frequencies, 
and a temporary array used for decoding)
    - I like the simplicity of 256 where we can keep encoding indexes in the 
array as bytes. E.g. skip data for positions needs to record the offset in the 
positions block of the first position of the first doc ID in the doc block, 
PFOR's patches need to record indexes where exceptions happen. It's a bit 
simpler if it can be encoded as bytes.
    -  256 is a bit safer wrt regressions: most queries get faster but as we 
see with `FilteredTerm`, some queries also get slower.
    - I only remember seeing 128 or 256 as block sizes in the IR literature. I 
have a bias towards not diverging too much from what is documented in the 
literature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Bump block size of postings to 256. [lucene]

Reply via email to