Hey all, I have been working on an initial implementation of intra-segment search concurrency for Lucene.
My goal is to introduce the ability to concurrently search partitions of the same segment, think of a force-merged segment for instance, in a way that's as transparent as possible to users. This way we can ideally decouple search concurrency from the index geometry, with the least impact on users. As part of my initial step, I decided to not tackle deduplicating work that happens globally per segment, which every partition would repeat on its own. This is certainly an important area to improve upon, yet I am hoping that we can treat it as a follow-up, mostly because there is enough work to do even without addressing that. After quite a few iterations, I have just marked my PR ready for review: https://github.com/apache/lucene/pull/13542. Tests are finally green. I wrote a rather detailed description on the PR itself that includes the problems I encountered, how I addressed them, and the way forward that I am proposing. There are still a couple of rough edges, and needed alignment on terminology API-wise. Mostly, what do we call a partition of a segment? Existing leaf slices are partitions of an index. We are now introducing partitions of segments that can be searched independently. I called them LeafReaderContextPartition, but I am not particularly attached to this specific name and open to feedback. This new terminology is only applied to the IndexSearcher#search method (not called directly by users though) and the IndexSearcher slices related methods. Otherwise, users that just call search don't need to necessarily know what a segment partition is, hopefully. I'd love to collect enough feedback to agree on a path forward and get this merged for Lucene 10, as it requires some API breaking changes as well as changes in internal behaviour. Looking forward to your feedback Cheers Luca